Google Xml Sitemaps


An undeniably essential part of optimizing a website is constructing a sitemap… Simply put, sitemaps give search engines a blueprint of how your site is laid out.

  1. Xml Sitemap Generator
  2. Google Xml Sitemaps
  3. Google Xml Sitemaps Review
  1. Make your sitemap available to Google by adding it to your robots.txt file or directly submitting it to Search Console. Sitemap formats. Google supports several sitemap formats, described here. Google expects the standard sitemap protocol in all formats. Google does not currently consume the attribute in sitemaps.
  2. XML Sitemaps, RSS feeds, etc. Help search engines understand the content of your website, including Google, Bing, Baidu, Yandex and more Google XML Sitemap Generator adds powerful and configurable HTML, RSS and XML sitemap features to your website.

They are especially essential to sites that have dozens and dozens of pages, a lot of archived content, and lack external links. And as implied in the name itself, these files give bots a map of the site, and it helps them index the most important pages.

In addition to providing search engines with a blueprint of how a specific website is laid out, sitemaps also include essential metadata like:

  • The importance of each page in relation to one another
  • When a page was last change
  • How often its updated

A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to more. XML Sitemaps help search engines crawl and understand your website’s content, structure and priorities. Your sitemap is a good place to begin to ensure Google’s discovering all your pages. Building XML sitemaps can be a little daunting but despite how they look, you don’t need a developer to create one. This post will cover how you can. Easily generate HTML, RSS and Google XML sitemaps for free. Compatible with major search engines including Google, Bing, Baidu, Yandex and more. XML sitemaps enable you to quickly and easily notify search engines about all the pages in your website and any changes.

And for your sitemap structuring needs, you have two well-known plugins to choose from: Google XML sitemaps or Yoast SEO.

For some this kind of choice might even be a no-brainer. But it’s only fair to check the sitemap capabilities of both plugins, along with the pros and cons of choosing one over the other.

Google XML Sitemaps

Implied in the name itself, the Google XML Sitemap plugin is pretty straightforward. Its single objective is to create a sitemap for the website. Supported over nine years now, it currently has 2 million plus active installations. Safari SEO London advise that XML sitemaps provide a simple table of contents that allow website crawlers to quickly understand the hierarchy and context of different webpages.

And yet, despite the given name, it’s not limited to just Google. Regardless of how it came to be, it’s readable by any search engine — Bing, Yahoo, Google, etc. The options included in Google XML Sitemap includes:

  • Selecting content categories that would be included in the sitemap: Homepage, categories, static pages, posts, author pages, tag pages, and archives.
  • Setting the page’s priority for indexation: Tells search engines how important it is for them to index the page
  • Setting the priority of each post for automatic calculation: Change each post’s priority based on the average comment count or the number of comments. You can also disable it and make use of a flat priority.
  • Excluding specific types of post categories or posts: Choose the posts you would want indexed or not.
  • Specifying URLs you would want included
  • Submitting the sitemap to a search engine automatically
  • Specifying change in frequencies for different types of pages and categories

Yoast SEO

Yoast SEO is one huge WordPress developer tool that offers more than just the creation of XML sitemaps. The plugin has 5 million plus active installations, and has been around since 2008.

Much like Google XML Sitemap, Yoast keeps up with the game pretty decently. The plugin gives you more control in submitting your sitemap to search engines like Google. Yoast SEO includes XML sitemap generation, and redirecting URLs.

Its general settings are separated into four main categories:

  • General
  • User sitemap
  • Post types: Modify the post types that won’t be included in your sitemap.
  • Taxonomies: Similarly, you can modify which categories won’t be included in the sitemap too.

Google XML vs. Yoast SEO

Google Xml Sitemaps

The two have similar purposes, and that is to generate a proper and functional sitemap. They don’t fail at that, but the two do differ in terms of structure. For instance, Yoast divides sitemaps according to type (post-sitemap.xml, category-sitemap.xml, etc).

On the other hand, Google XML sitemaps’ separation is month-based. As a result, corresponding files more likely have less than a hundred URLs. That means it’s small enough to be measured in kilobytes — like picture. It lowers the load pressed onto the server while it generates the files.

What to do after creating your sitemap

When the XML sitemap creations are said and done via either of the two plugins, developers don’t just sit there and fold their arms across their chests. There are several other things you need to do to fully optimize these sitemaps.

  1. Submit the sitemap to Google

    You can do this via Google Search Console. It’s also a good idea to test your sitemap afterwards and view the results before you submit the sitemap. Remember to check for errors that might prevent landing pages from indexing a page. Again, remember that by submitting your sitemap, you’re actually telling Google which pages you consider to be high quality and worthy of being indexed.

  2. List high-quality pages on your sitemap

    Google’s rankings aren’t dependant on key stuffing and what can now be considered as blackhat SEO techniques. As we all know, it’s all about quality over quantity.

    A key ranking factor is overall website quality. So, if your sitemap directs search engine bots to a bunch of low-quality pages, search engines interpret that as not index-worthy. Ideally, the pages you should include in the sitemap should have:

    • Multimedia aspects (images and videos).
    • Encourage audience engagement via reviews, comments, a ‘talk-to-us’ page, etc.
    • Loads of unique and high-quality content formatted for web reading.
  3. Keep your file size to a minimum

    Search engines like Google and Bing have since increased the size of acceptable sitemap files; from 10MB to 50MB. But it’s advisable to keep sitemaps as lean as possible. The smaller your sitemap is, the less strain you put your server through.

  4. Only search engine friendly version of URLs in the sitemap

    For ecommerce websites, having multiple pages that are slightly similar (product pages, in particular) are common. For pages like these, make use of the “link rel=canonical” tag, so Google understands which page is the “main” one for crawling and indexing.

  5. Be honest about Modification Times

    Don’t make the mistake of trying to trick search engines into considering a page for re-indexing by constantly updating your modification time without any reason. Do that only when you’ve made substantial pages to the page. This practice is simply risky SEO. A possible consequence includes Google removing your date stamps when they notice that it’s updated constantly with no real value.

  6. Building multiple sitemaps

    Remember that you are limited to creating 50,000 URLs per sitemap.

    This number may seem atrociously huge for some site creators, but it’s not as crazy if you’re running a very large ecommerce website. Highly likely, these sites have more than 50,000 URLs, and need to create additional sitemaps to accommodate them all for indexing.

The Takeaway

But then again, when all is said and done, the only thing that truly matters in situations like this is that both plugins generate a valid XML sitemap. And because Google XML Sitemap is very handy with giving you minimized control, you can use it in tandem with Yoast SEO. (Just be sure to switch off the latter’s features).

In essence, you use whatever would work for you — whether that’s Google XML sitemaps or Yoast SEO.

Jump to:
XML tag definitions
Entity escaping
Using Sitemap index files
Other Sitemap formats
Sitemap file location
Validating your Sitemap
Extending the Sitemaps protocol
Informing search engine crawlers

This document describes the XML schema for the Sitemap protocol.

The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.

The Sitemap must:

  • Begin with an opening <urlset> tag and end with a closing </urlset> tag.
  • Specify the namespace (protocol standard) within the <urlset> tag.
  • Include a <url> entry for each URL, as a parent XML tag.
  • Include a <loc> child entry for each <url> parent tag.

All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine's documentation for details.

Also, all URLs in a Sitemap must be from a single host, such as www.example.com or store.example.com. For further details, refer the Sitemap file location

Sample XML Sitemap

The following example shows a Sitemap that contains just one URL and uses all optional tags. The optional tags are in italics.

Also see our example with multiple URLs.

XML tag definitions

The available XML tags are described below.

Attribute Description
<urlset> required

Encapsulates the file and references the current protocol standard.

<url> required

Parent tag for each URL entry. The remaining tags are children of this tag.

<loc> required

URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2,048 characters.

<lastmod> optional

The date of last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion, if desired, and use YYYY-MM-DD.

Note that this tag is separate from the If-Modified-Since (304) header the server can return, and search engines may use the information from both sources differently.

<changefreq> optional

How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

The value 'always' should be used to describe documents that change each time they are accessed. The value 'never' should be used to describe archived URLs.

Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked 'hourly' less frequently than that, and they may crawl pages marked 'yearly' more frequently than that. Crawlers may periodically crawl pages marked 'never' so that they can handle unexpected changes to those pages.

<priority> optional

The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers.

The default priority of a page is 0.5.

Please note that the priority you assign to a page is not likely to influence the position of your URLs in a search engine's result pages. Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your most important pages are present in a search index.

Also, please note that assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is relative, it is only used to select between URLs on your site.

Entity escaping

Your Sitemap file must be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the table below.

Character Escape Code
Ampersand & &amp;
Single Quote ' &apos;
Double Quote ' &quot;
Greater Than > &gt;
Less Than < &lt;

In addition, all URLs (including the URL of your Sitemap) must be URL-escaped and encoded for readability by the web server on which they are located. However, if you are using any sort of script, tool, or log file to generate your URLs (anything except typing them in by hand), this is usually already done for you. Please check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard.

Below is an example of a URL that uses a non-ASCII character (ü), as well as a character that requires entity escaping (&):

Below is that same URL, ISO-8859-1 encoded (for hosting on a server that uses that encoding) and URL escaped:

Below is that same URL, UTF-8 encoded (for hosting on a server that uses that encoding) and URL escaped:

Below is that same URL, but also entity escaped:

Sample XML Sitemap

The following example shows a Sitemap in XML format. The Sitemap in the example contains a small number of URLs, each using a different set of optional parameters.

Using Sitemap index files (to group multiple sitemap files)

You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). If you would like, you may compress your Sitemap files using gzip to reduce your bandwidth requirement; however the sitemap file once uncompressed must be no larger than 50MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.

If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 50MB (52,428,800 bytes) and can be compressed. You can have more than one Sitemap index file. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.

The Sitemap index file must:

  • Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag.
  • Include a <sitemap> entry for each Sitemap as a parent XML tag.
  • Include a <loc> child entry for each <sitemap> parent tag.

The optional <lastmod> tag is also available for Sitemap index files.

Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com. As with Sitemaps, your Sitemap index file must be UTF-8 encoded.

Sample XML Sitemap Index

The following example shows a Sitemap index that lists two Sitemaps:

Note: Sitemap URLs, like all values in your XML files, must be entity escaped.

Sitemap Index XML Tag Definitions

Attribute Description
<sitemapindex> required Encapsulates information about all of the Sitemaps in the file.
<sitemap> required Encapsulates information about an individual Sitemap.
<loc> required

Identifies the location of the Sitemap.

This location can be a Sitemap, an Atom file, RSS file or a simple text file.

<lastmod> optional

Identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format.

By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler may only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites.

Other Sitemap formats

The Sitemap protocol enables you to provide details about your pages to search engines, and we encourage its use since you can provide additional information about site pages beyond just the URLs. However, in addition to the XML protocol, we support RSS feeds and text files, which provide more limited information.

Syndication feed

You can provide an RSS (Real Simple Syndication) 2.0 or Atom 0.3 or 1.0 feed. Generally, you would use this format only if your site already has a syndication feed. Note that this method may not let search engines know about all the URLs in your site, since the feed may only provide information on recent URLs, although search engines can still use that information to find out about other pages on your site during their normal crawling processes by following links inside pages in the feed. Make sure that the feed is located in the highest-level directory you want search engines to crawl. Search engines extract the information from the feed as follows:

  • <link> field - indicates the URL
  • modified date field (the <pubDate> field for RSS feeds and the <updated> date for Atom feeds) - indicates when each URL was last modified. Use of the modified date field is optional.

Text file

You can provide a simple text file that contains one URL per line. The text file must follow these guidelines:

  • The text file must have one URL per line. The URLs cannot contain embedded new lines.
  • You must fully specify URLs, including the http.
  • Each text file can contain a maximum of 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). If you site includes more than 50,000 URLs, you can separate the list into multiple text files and add each one separately.
  • The text file must use UTF-8 encoding. You can specify this when you save the file (for instance, in Notepad, this is listed in the Encoding menu of the Save As dialog box).
  • The text file should contain no information other than the list of URLs.
  • The text file should contain no header or footer information.
  • If you would like, you may compress your Sitemap text file using gzip to reduce your bandwidth requirement.
  • You can name the text file anything you wish. Please check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs
  • You should upload the text file to the highest-level directory you want search engines to crawl and make sure that you don't list URLs in the text file that are located in a higher-level directory.

Sample text file entries are shown below.

Sitemap file location

Google Xml Sitemaps

Mac os virtualbox 2020. The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.

If you have the permission to change http://example.org/path/sitemap.xml, it is assumed that you also have permission to provide information for URLs with the prefix http://example.org/path/. Examples of URLs considered valid in http://example.com/catalog/sitemap.xml include:

URLs not considered valid in http://example.com/catalog/sitemap.xml include:

Note that this means that all URLs listed in the Sitemap must use the same protocol (http, in this example) and reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it can't include URLs from http://subdomain.example.com.

URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.xml. In certain cases, you may need to produce different Sitemaps for different paths (e.g., if security permissions in your organization compartmentalize write access to different directories).

If you submit a Sitemap using a path with a port number, you must include that port number as part of the path in each URL listed in the Sitemap file. For instance, if your Sitemap is located at http://www.example.com:100/sitemap.xml, then each URL listed in the Sitemap must begin with http://www.example.com:100.

Sitemaps & Cross Submits

To submit Sitemaps for multiple hosts from a single host, you need to 'prove' ownership of the host(s) for which URLs are being submitted in a Sitemap. Here's an example. Let's say that you want to submit Sitemaps for 3 hosts:

Moreover, you want to place all three Sitemaps on a single host: www.sitemaphost.com. So the Sitemap URLs will be:

By default, this will result in a 'cross submission' error since you are trying to submit URLs for www.host1.com through a Sitemap that is hosted on www.sitemaphost.com (and same for the other two hosts). One way to avoid the error is to prove that you own (i.e. have the authority to modify files) www.host1.com. You can do this by modifying the robots.txt file on www.host1.com to point to the Sitemap on www.sitemaphost.com.

In this example, the robots.txt file at http://www.host1.com/robots.txt would contain the line 'Sitemap: http://www.sitemaphost.com/sitemap-host1.xml'. By modifying the robots.txt file on www.host1.com and having it point to the Sitemap on www.sitemaphost.com, you have implicitly proven that you own www.host1.com. In other words, whoever controls the robots.txt file on www.host1.com trusts the Sitemap at http://www.sitemaphost.com/sitemap-host1.xml to contain URLs for www.host1.com. The same process can be repeated for the other two hosts.

Now you can submit the Sitemaps on www.sitemaphost.com.

When a particular host's robots.txt, say http://www.host1.com/robots.txt, points to a Sitemap or a Sitemap index on another host; it is expected that for each of the target Sitemaps, such as http://www.sitemaphost.com/sitemap-host1.xml, all the URLs belong to the host pointing to it. This is because, as noted earlier, a Sitemap is expected to have URLs from a single host only.

Validating your Sitemap

The following XML schemas define the elements and attributes that can appear in your Sitemap file. You can download this schema from the links below:

For Sitemaps: http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
For Sitemap index files: http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd

There are a number of tools available to help you validate the structure of your Sitemap based on this schema. You can find a list of XML-related tools at each of the following locations:
http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html

In order to validate your Sitemap or Sitemap index file against a schema, the XML file will need additional headers as shown below.

Sitemap:

Sitemap index file:

Extending the Sitemaps protocol

You can extend the Sitemaps protocol using your own namespace. Simply specify this namespace in the root element. For example:

Informing search engine crawlers

Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location. You can do this by:

  • sending an HTTP request

The search engines can then retrieve your Sitemap and make the URLs available to their crawlers.

Submitting your Sitemap via the search engine's submission interface

To submit your Sitemap directly to a search engine, which will enable you to receive status information and any processing errors, refer to each search engine's documentation.

Specifying the Sitemap location in your robots.txt file

Xml Sitemap Generator

You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line including the full URL to the sitemap:

This directive is independent of the user-agent line, so it doesn't matter where you place it in your file. If you have a Sitemap index file, you can include the location of just that file. You don't need to list each individual Sitemap listed in the index file.

You can specify more than one Sitemap file per robots.txt file.

Submitting your Sitemap via an HTTP request

To submit your Sitemap using an HTTP request (replace <searchengine_URL> with the URL provided by the search engine), issue your request to the following URL:

Google Xml Sitemaps

For example, if your Sitemap is located at http://www.example.com/sitemap.gz, your URL will become:

URL encode everything after the /ping?sitemap=:

You can issue the HTTP request using wget, curl, or another mechanism of your choosing. A successful request will return an HTTP 200 response code; if you receive a different response, you should resubmit your request. The HTTP 200 response code only indicates that the search engine has received your Sitemap, not that the Sitemap itself or the URLs contained in it were valid. An easy way to do this is to set up an automated job to generate and submit Sitemaps on a regular basis.
Note: If you are providing a Sitemap index file, you only need to issue one HTTP request that includes the location of the Sitemap index file; you do not need to issue individual requests for each Sitemap listed in the index.

Excluding content

The Sitemaps protocol enables you to let search engines know what content you would like indexed. To tell search engines the content you don't want indexed, use a robots.txt file or robots meta tag. See robotstxt.org for more information on how to exclude content from search engines.

Google Xml Sitemaps Review

Last Updated: Monday, November 21, 2016