Your write and maintain your website for human visitors. And yet when you peer into your analytics logs, you see that the vast majority of hits to your server come from bots. These don’t show up on client-side measuring tools like Google Analytics, but they appear on the server logs. Most traffic these days is from automated bots. You might be tempted to block them all. But that would be disastrous. What you need is a policy that understands which bots to let through, and which bots to block.
Not All Bots Are Bad – Search Engine Bots Need to be Let Through
Probably the most important bots that visit your site are those linked to the search engines. You don’t want to block Google from crawling your site! Annoying as it may seem, search engine bots will bombard your site with requests for static and dynamic content, and there’s nothing you can do about it. If you block them, you’re toast.
Blocking Other Crawlers Might be Inconvenient
The second category of bots are those that might be useful. They’re usually tools you use to analyze your site automatically. For example, let’s say you want to use Screaming Frog to crawl your site for a complete SEO audit. They use a crawler that needs access to your site’s contents. Another example is SEO tools like SEMRush, whose bots crawl your site to generate backlink reports, keyword optimization strategies, and more.
Another example is if you want an online tool to crawl your site to extract the critical CSS so that you can inline it. The tool won’t be able to carry out its function if you block bots.
However, these tools are used rarely enough, that you can consider disabling your bots firewall just long enough for them to complete their task, and then re-enable it.
All Other Bots can be Safely Blocked
If a bot isn’t crawling your site to index it for search, or an online tool (used by you), you can safely block it without consequence. Chances are that it’s trying to scrape your content, or that hack/spam your site.
Using Cloudflare to Block Bots
Cloudflare has a very useful tool to block bots. You can find it in the “Firewall” section under bots. It has a whitelist of “verified bots” that include all the important search engines, and popular tools like Ahrefs so that your site doesn’t block important stuff.
If you’re using their free plan, Cloudflare will use JavaScript detections to check if a visitor is a bot, and if so, will block them when you enable the option. As a “Pro” user, you can go one step further and block automated bots. Here’s a screenshot of the breakup of the visitors to my site:
You can see that over 50% of all visits to my site are from “verified bots”. But around seven thousand visits are either “Automated” or “Likely Automated”. I’m comfortable blocking the “Definitely Automated” visitors.
When Cloudflare blocks a bot, it logs a report in the firewall. Here’s a screenshot of all the traffic that Cloudflare has blocked on my behalf:
The “Firewall: Managed” rules in this care are almost entirely automated bots. In addition to this, Cloudflare also blocks a whole bunch of traffic based on its WAF rules, but since these are always automated bots in the first place, they get blocked immediately before Cloudflare has a chance to evaluate their behavior and requests.
Blocking bots is a great way to keep your site’s resources free for real human visitors. Particularly when these bots are trying to access non-existent PHP files in an attempt to probe your application for weaknesses. Doing this with Cloudflare is easy, and even those using the free tier can benefit greatly. I see no reason why most people shouldn’t turn it on.
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!
Leave a Reply