I woke up this morning to see the following message from Google in my e-mail inbox:
I wasn’t surprised by this of course. In fact, I’ve been expecting this. Google has been warning us for a long time that they’re removing support for noindex in robots.txt. So if you’ve been relying on this to prevent crawlers from indexing your site, you should use some of the other methods to do so for now. But I think this is a terrible decision. The “noindex” tag absolutely belongs in robots.txt for the reasons outlined below.
At the end, I’ll also give you some guidelines on how to ensure that your pages remain “noindexed” now that the robots.txt rule is no longer supported by Google.
Noindex is For Robots. So Where Should It Belong?
The core of my beef with Google’s decision to remove support for noindex from robots.txt is that it makes no thematic sense. Sure, we can use .htaccess to specify no-index. Yes, we can use the robots meta tag in the page HTML.
But the “noindex” directive is meant for robots. And what’s the point of having a robots.txt file, if you can’t use it to specify instructions for robots? Google’s excuse is that it’s not a supported directive. Well then…maybe it should be right? And since so many sites are using it and advocate its use, then there’s no harm in letting it continue!
If anything, the exclusion of noindex for the robots.txt standard is an oversight. When we can specify “disallow”, then why not specify nonidex as well? It makes no sense!
Other Alternatives are Too Programmatic
One of the benefits of using robots.txt is that you don’t need to touch your site. The file sits quietly in the root directory and doesn’t interfere with your site’s functionality. You don’t risk crashing stuff. And there are plenty of online tools that can check robots.txt to ensure that your web page is accessible.
Meta Tags in the HTML
By contrast, the two other “recommended” approaches to specify the noindex directive, both require you to intervene in your site and make changes to it. The first is the “meta” tag in the HTML. You can do this either using a piece of code yourself, or with a plugin.
The problem is that it’s not easy to target a wide swathe of pages that match a certain URL. You have to come up with a creative solution to make that happen. If you’re using WordPress, that means using a hook to check the page URL and add the noindex tag dynamically.
That impacts a bunch of stuff like site speed. I don’t like the idea of programmatically adding a meta robots tag to pages.
The other solution is even worse. The recommendation is that you use the .htaccess file to include a “noindex” response header. This really rubs me the wrong way. The .htaccess file is a very technical piece of work. It’s incredibly easy to break your site if you get even a single character wrong. It’s a complex interaction of rules, and code written by yourself, and other plugins.
Under normal circumstances, I never touch the .htaccess file if I can help it. And when I do so, I change it with the utmost care, painfully aware that I risk crashing my site. The notion of using .htaccess for something as trivial as robots.txt header, really annoys me.
But I Have no Choice – For Now
As of now, I have no choice but to use one of the above two methods to make noindex work. However, I will not be removing it from my robots.txt file. The web is more than just Google, and perhaps other search engines will find it useful. In the meantime, I hope Google realizes one day that not everyone wants to rework their site or .htaccess file merely to include a “noindex” response header.
That directive is meant for robots, and the right place for it is robots.txt. End of story.
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!