Sitemap file creation guide – sitemap.xml

Sitemap file creation guide - sitemap.xml

The sitemap file – sitemap.xml, is designed to simplify navigation for search crawlers through the resource pages. Let’s take a look at the main ways to create this file and configure it.

Direct Line
Who are we

Largest agency
Internet marketing outside the Moscow Ring Road:
1200+ projects
65 specialists
fourteen years on the market
TOP 10
the best online promotion companies in Russia 2020

Commercial offer

Purpose of sitemap.xml

Sitemap.xml is a file, created manually or automatically, intended for search crawlers and providing them with information about the structure of the site. It contains the URLs of the pages, as well as additional data on them:

  • Date of creation.
  • Date of change.
  • The priority for indexing the page.
  • Update frequency.

By the way, before starting crawling, the search robot first visits the robots.txt file, and after the sitemap.xml.

Sitemap will help in the following situations:

  • The site has a complex structure, a large number of nested subcategories. In this case, the robot may take a very long time to “get” to the lower-level pages.
  • The site has a large number of documents. A certain amount of money is allocated for scanning. crawling budget – the limit on the number of pages that the crawler indexes over a certain period of time. If there are many URLs on the site, then some of them may go unnoticed by the robot. In this case, the sitemap allows you to set the indexing priority for them (if necessary) relative to other documents, as well as indicate their addresses, in principle.
  • The site does not have a clear structure, while the pages are linked in a chaotic manner.
  • There are pages that are not directly linked from other documents. It is better not to allow such a thing, but if for some reason there are such pages, and their indexing is required, then we indicate this in the sitemap.
  • Acceleration of scanning. In the sitemap, you can specify the date of creation or modification of the page, thus, the robot will have information about those documents that need to be indexed first.
  • The created pages are regularly updated.
  • The site has recently been launched.

Is it possible to do without sitemap.xml?

The presence of a sitemap file is advisory in nature. So, in the Yandex.Webmaster panel, if it is absent, a notification appears in the “Possible problems” section.

Of course, if this file is missing, then in most cases, crawlers will still index the site correctly. However, if a complex SEO-promotion is carried out on the resource, and the commercial component of the business depends on its success, it is still recommended to create and configure a sitemap.xml.

Ways to create a sitemap

Now let’s look at the main ways to create a sitemap.xml.

Manual creation

Correct Sitemap.xml is a text file, the structure of which is filled in following a certain syntax. Therefore, if the site is small and rarely updated, then you can create a map file manually, observing the rules – the syntax will be discussed below.

This method is inappropriate to use on resources with a large number of pages and their frequent updates.

File creation in online services and programs

This is also an easy way to create a sitemap.xml: go to the service, specify the site URL, start the generation and download the finished file.

Take the popular Xml-sitemaps.com generator as an example. We drive in the URL of the site for which the map is generated. At the same time, the service notifies that no more than 500 pages will be scanned in the free version. After that, we start the process of scanning and creating a file.


Xml-sitemaps.com Sitemap Generator

At the end of the process, you will be prompted to download the finished file.

We look at the finished file
We look at the finished file

A file is created in a similar way in desktop programs:

  • SiteMap XML Dynamic.
  • WonderWebWare.
  • Screaming Frog SEO Spider.

The disadvantage of this method of creating a file is that it will constantly have to be updated manually. That is, if new pages appear on the site, the file should be generated over a new one, or the meaning of its use is lost. This is extremely inconvenient when the site is actively developing.

Using plugins for CMS

Most of the popular CMS already have plugins ready to create the correct sitemap.xml file. For WordPress, the following can be recommended:

  • Google XML Sitemaps. Free plugin with simple settings. Allows you to configure URL exceptions.
  • Yoast SEO includes a whole set of tools for complex SEO promotion, including the ability to automatically create a sitemap.
  • Rank Math.

The WordPress repository contains dozens of plugins similar to those listed.

For sites on 1C-Bitrix, you do not need to install additional components, because there is already a built-in tool. It is available in the settings: “Marketing – Setting up sitemap.xml”.

If the site is on Joomla, then you should pay attention to the OSMap and jSitemap plugins.

The main advantage of this method of creating a map is that it will be generated automatically when new documents appear on the site. That is, the plugin is configured once, and then everything works without the participation of the webmaster.

Sitemap.xml syntax

The syntax of the sitemap.xml file contains the following blocks:

  • The entire content of the file must be in the tags.
  • All information about the page, including its URL, must be in the block.
  • This places the URL in the parent tag.

Now let’s look at the tags that can be in the sitemap.xml.

Tags that must be present in the file are required:

  • – contains the protocol standard.
  • – block containing information on the URL.
  • is the URL of the page, and must begin with the HTTP connection protocol. Length limits: 2048 characters.

Additional tags:

  • – the time of the last page change is indicated here. In this case, the date must be in the W3C Datetime format.
  • – if the page is updated regularly or not at all, then this information is located here. The value is approximate, indicated in text form:
    • Always – means that the content of the page is refreshed on every load.
    • Hourly – every hour.
    • Daily – daily.
    • Weekly – once a week.
    • Monthly – monthly.
    • Yearly – once a year.
    • Never – never.

This is not a direct command to action, it acts as a hint for the search crawler. For example, if this tag indicates that the page is updated every day, the robot may visit it more or less often. The same applies to pages that are tagged as “never”, crawlers can still visit them periodically.

  • – the priority of a particular page relative to the rest is indicated. The value can be from 0.0 to 1.0 – the higher, the more priority the page is. By default, it is 0.5. This tag allows you to increase the likelihood of indexing individual pages, but does not affect their ranking in search engines.

This tag should be treated as a tool for determining the order in which pages are indexed.

File Requirements

In addition to following the syntax rules, the sitemap.xml file must meet the following technical requirements:

  • Must be saved in UTM-8 encoding.
  • Cyrillic in URLs can be specified both in its original form and in encrypted one.
  • There should be no more than 50K URLs in the file. If this limit is not enough, then several files are created and transferred to the merging map file. However, it has the same limit of 50 thousand links to map files.
  • The file size limit is 50 MB. In this case, additional compression is allowed using the gzip technology. Don’t forget the 50K URL limit.
  • The file must be hosted on the same domain as the site.
  • When requesting a file, the server should return a 200 response code.
  • The file must be available for indexing, be sure to check robots.txt for the absence of such a prohibition.

At the same time, Yandex supports not only XML, but also the TXT format. However, this feature is not available for Google. In addition, the TXT format allows you to transfer only page URLs without additional parameters.

How to report a file to search engines?

The sitemap.xml file is located at the root of the site, the search crawler is able to find it on its own. But it is recommended to specify its location in the Yandex and Google Webmasters panels.

Specify the location of the sitemap.xml in Yandex.Webmaster
Specify the location of the sitemap.xml in Yandex.Webmaster

Add the file to Google Search Console
Add the file to Google Search Console

You can also specify the location of this file in robots.txt:

Sitemap: https://mysite.com/sitemap.xml

Yandex Webmaster allows you to check the sitemap file for the presence / absence of errors. The tool is located at: “Tools – Sitemap.xml File Analysis”. In the window that opens, you can load a file, specify a path to it, or copy its contents to the tool.

To find errors in Google Search Console, you need to upload a file, send it for review, and then read the results.

Example sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.musite.com/</loc>
      <lastmod>2021-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>1.0</priority>
   </url>
   <url>
      <loc>http://www.musite.com/1.html</loc>
      <lastmod>2021-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.mysite.com/2.html</loc>
      <lastmod>2021-02-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.7</priority>
   </url>
</urlset>

An example of a filled sitemap.xml with three URLs, the rest of the content is filled in by analogy.

# seo
# Web development
# instruments

Leave a Reply

Your email address will not be published. Required fields are marked *