XML Sitemaps: Why URL Sequencing Matters Even if Google Says It Doesn’t
There are a great many things that SEOs debate – do this, don’t do that, this makes a difference but that doesn’t.
No one knows the secrets of Google’s algorithms except Google (and sometimes I wonder if the algorithm is even too complex for some at Google to know how it works).
That said, there are some things that Google flat-out says doesn’t matter if we do them or not. Does that mean we shouldn’t do them? No, it doesn’t.
In a perfect world, our websites and Google will perform exactly how they should and in our favor. In reality, any number of things can go sideways when the search engines crawl a site.
If it doesn’t harm a website to implement something that may make it better for search engines to crawl and understand – and it’s easy enough to do – then why not do it?
XML sitemaps and priority tags, change frequency tags and URL ordering are some of those debated SEO tactics. Let’s discuss:
XML Sitemap Basics
An XML sitemap is a file that webmasters create and put on their site to tell search engines like Google and Bing about the pages, images and videos that are on the site.
The sitemap works like a map, helping ensure more thorough crawling and indexing. However, an XML sitemap does not guarantee that search engines will index or crawl all pages, nor will a sitemap impact your rankings.
(Check out our primer on XML sitemaps for more information.)
There are optional tags that some people like to include in their sitemap, such as the priority tag and the change frequency tag. Google’s official stance is that it ignores both.
Then there is the practice or URL sequencing (ordering the URLs in your sitemap in order of your priority). Google says it ignores this, too.
But should we ignore it? Let’s talk next about these three practices.
The priority tag tells Google how important the page is from 1 (the highest) to 0 (the lowest).
Sitemap priority values go from 1.0 to 0.0 with the 1.0 value indicating the most important page on the site. Priority values look like so: 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1 and 0.0. For instance, 1.0 would likely be the homepage.
If you use this tag, it doesn’t mean that Google will crawl the pages you think are important the most. It is up to the algorithm to determine what is most relevant and priority may not match. You can try to figure out which pages get crawled the most by looking at server logs.
In general, when using the priority tag you’d assign values like this:
- 1.0 – 0.8 = Category pages, homepage, top landing pages
- 0.7 – 0.4 = Blog articles, secondary category pages, subcategory pages
- 0.3 = 0.0 = Pages that are not as important like outdated content or utility-type pages
Change Frequency Tags
The change frequency tag is meant to tell Google how often pages are updated on your website. In theory, Google would view it and make a decision on whether to crawl that page again based on when it was last updated.
Again, Google’s official stance is that it ignores the change frequency tag. And, Google’s John Mueller has clarified that it is better to “specify the time stamp directly so that we can look into our internal systems and say we haven’t crawled since this date therefore we should crawl again.”
This tag is most certainly irrelevant if it’s not accurate. As Mueller mentions in the video above, “we see a lot of sites they give us this information in the sitemap, they said it changes daily or weekly, and we look in our database and it hasn’t changed in a month or years.”
So, Should We Use the Tags?
The fact that the tags are mentioned in Google’s XML sitemap documentation at all as “optional” is confusing – especially when they say they ignore them. Does Googlebot really ignore them every time? It’s hard to know. It is always better to use them, especially last revision dates (<lastmod>).
I believe there is another option you can do with your sitemap to indicate how you want the search engines to crawl. Google also says it ignores this method, though, but stick with me.
We’ve seen some success with indexation through URL sequencing, which I’ll talk about next.
URL sequencing is the practice of ordering the URLs in your sitemap according to priority – but not in the way the priority tag works. Google ignores the priority tags, so Google thinks all pages are “equal” thus sequence does not matter … at least for priority. But the pages are not equal in real life, especially if you have poor crawl budgets.
We strongly recommend that you sequence the URLs in your XML sitemap in a way that addresses both the concepts of priority and change frequency without using the tags. Here, the goal is to get key pages indexed faster.
And BTW, only use URLs that match the canonical tags found somewhere on your site or that are important pages!
For example, this is a sequence emphasizing most recently modified pages:
- One-day-old recently changed entries (new redirect targets, new or revised pages) [500 entries per XML page]
- One-week-old entries as above, just a bit older [500 entries per XML page]
- Remaining 200 code pages (sorted descending by impressions) [1000 entries per page]
- Any other redirecting pages (30x codes) [5k per page]
- Images and videos [500 per page]
- 404 pages [10k per page]
- The rest [10k per page]
Essentially what you are doing is providing a roadmap to the search engines about the pages you believe need to be crawled based on newness. You want all pages to get into the index and believe that the pages spidered last week are already there – but new pages are not. Use our sequencing directives above.
Why bother doing this if Google has explicitly said it ignores priority and sequence?
Given unlimited crawl budget, few website redirects and no errors, all files get crawled. Priority and sequence do not matter. Google is right if assumptions hold.
But they don’t hold. There is a crawl budget, and it is eaten away by redirects and 404s. If you have anything other than a page (image, video, hreflang, etc.), the time estimates are wrong. Throw in any significant errors and the remaining files in the XML sitemap are ignored.
While in an ideal world priority and sequence do not matter, in a crawl budget world they do matter. In my experience, the sequence of the URLs to the search engine is all that matters.
We have found that this increases the number of pages spidered and decreases the “abandoned due to error” issues.
Yes, it is up to the bots to decide how to handle XML sitemaps. However, implementing something that could potentially help search engines crawl and index your website content – if easy enough – is never a bad idea.
Our SEO experts can help you optimize your site so that it can be easily crawled and indexed. Reach out to us for a free consultation.