Google: Don’t use Crawlers to build Sitemaps. Automate them.

Google's John Mueller replied to a Reddit Thread with some advice on how best to build an XML sitemap, including warning against crawler-based

When updating your website’s sitemap, you should “automate it,” says Google’s John Mueller. He added that you should not use services that crawl your site, as “Google already does that.”

Mueller was responding to a question by a Reddit user who wanted to know the best strategy to keep their XML sitemap up to date on a large site with thousands of articles, with many new ones added weekly.

The Reddit user said that they have already tried paid solutions and even node.js to crawl the website and generate the XML website.

I tried some paid solution and even open source node.js library to crawl the website and generate the xml sitemap. But the process takes forever. So I’m assuming it’s not the best way to tackle this problem.

Mueller responded, saying that XML files should be generated by your database, which will ping the sitemap file as soon as changes are made by referencing the exact last-modification date.

Automate it on your backend (generate the files based on your local database). That way you can ping sitemap files immediately when something changes, and you have an exact last-modification date.

He continued, “Don’t crawl your own site, Google already does that.”

Why Use Automated XML Sitemaps

An XML sitemap is an unstyled document that displays all the pages on your site, along with the last modified date, and other essential information.

You can then declare your sitemap in your robots.txt file and submit it directly to search engines so that they know to crawl all the pages on your site.

A sitemap looks like the following:

Example XML Sitemap.
Example XML Sitemap. © The Search Review

According to Google’s Gary Illyes, XML sitemaps are the second most important source of URLs to be crawled by Googlebot.

The first discovery option being Google’s own crawler, which is aptly named Googlebot.

Googlebot works by following links on websites, so if you have orphaned pages or a very large site with deep pages (pages only found after traveling through many other pages), Googlebot may not crawl them.

There is no sense in crawling a website yourself with an automated tool as this will not pick up orphaned pages, nor will it keep important information such as last modified times up to date.

These tools are only doing what Google is already doing.

Jonathan Griffin. Editor @ The Webmaster

Editor, SEO Consultant, & Developer.

Jonathan Griffin is The Webmaster's Editor & CEO, managing day-to-day editorial operations across all our publications. Jonathan writes about Development, Hosting, and SEO topics for The Webmaster and The Search Review with more than nine years of experience. Jonathan also manages his own SEO consultancy, offering SEO developer services. He is an expert on site-structure, strategy, Schema, AMP, and technical SEO. You can find Jonathan on Twitter as @thewebmastercom.

Read more about Jonathan Griffin on our About Page.