Have you ever come across a website that’s almost an exact duplicate of yours? It can be a somewhat disorienting experience! It seems incredible that a 3rd party would take your content and claim it as its own without attribution. They even copy the images! These people are called “scrapers”, and site scraping can be a huge problem. Here I’ll show you a few ways you can discourage site scraping, and make scrapers look elsewhere.
Site Scraping Can Threaten your Search Rankings
Scrapers wouldn’t matter so much if search engines like Google were smart enough to identify and delist them. You would think that Google – which spends so much time lecturing us about how great its algorithms are at detecting spam – would easily be able to identify scraped content. However, I read with incredulity as John Mueller from the Google search team, coolly claimed that sometimes copied content can outrank original content!
It’s a slap in the face of all content creators. IMHO, this is just Google being lazy and refusing to make changes to their algorithm to penalize scrapers. They feel that if their algorithm determines that copied content is more relevant, then they should reward the site that does the copying.
However, complaining won’t help. Here, I’ll show you some ways you can make it harder for people to copy your stuff.
Turn on Bot Protection from Cloudflare
I’ve written before about the benefits of subscribing to Cloudflare’s Pro plan. It’s certainly not for everyone, but one of the hidden benefits is that it allows you to block automated bots that Cloudflare detects using its machine learning algorithms. Here’s a screenshot of bot traffic on my site WP-Tweaks:
You can see that in the past 24-hours, around 15% of my traffic is automated non-verified bots. The “verified” bots make up legit search engines, including Google, Bing, Yandex, etc. Here’s a list of currently verified bots by Cloudflare. In the “Settings” page for bot management, you can decide to allow or block automated bots apart from the verified ones. This will cut down on content scraping dramatically. Scrapers are lazy. They tend to automate their tasks instead of manually copying and pasting content.
Implement Rate Limiting
Another technique of stopping scrapers, it to implement rate-limiting from a specific IP address. A legitimate visitor is not going to request dozens (or even 5) pages within the space of a few seconds. I’ve written before on NameHero about the challenges of rate-limiting, but you might want to consider it if you’re worried about content scrapers.
If you set it up properly, rate limiting can be an amazing tool to keep bad traffic from constantly hitting your site. It should stop most content scrapers in their tracks.
Implement a Login System
This obviously won’t work for everyone. And it doesn’t help with search engines that can’t log in to index your site. But if you have truly valuable content that you simply can’t allow any unauthorized person to see, the best thing to do is to make people log in after verifying who they are. Even then, a competitor might try and create an account, so it’s up to you to decide which criteria you want to use for allowing subscribers in.
In the End, You Can’t Stop a Determined Scraper
Given the fact that you ultimately have to show your content to someone, there’s no way to stop someone from copying your stuff. You can just make it harder. And content scrapers are overwhelmingly composed of lazy people – otherwise, why would they copy content? Make it hard for them, and you should solve most of your scraping problems!
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!
Leave a Reply