I’ve seen my share of site scrapers, and it’s safe to say that the overwhelming majority pose no threat to you. Google has some documentation indicating that it prefers original content over duplicate content. However, there are also cases where Google not only doesn’t penalize sites that plagiarize but actively ranks them higher than the original site! According to an interview with the Google Search Central team, the can happen if Google doesn’t trust the original website.
DMCA Requests are Not a Solution
Of course, you could always send a DMCA takedown request, but this isn’t for everyone. First, you need to find the plagiarized website in the first place. Second, if the target site doesn’t comply with the DMCA, you have no choice but to engage in a laborious legal process that might or might not be worth it.
In any case, there’s an asymmetry of effort here. On the one hand, copying content is straightforward. An automated can scrape your entire site in minutes and create a duplicate copy. When penalized, they can switch to another domain and repeat. On the other hand, it requires a lot more effort from you to track down and confront the site owners or hosting providers with DMCA notices.
It’s far better to make the process of copy/pasting and scraping as hard as possible for 3rd parties. These methods won’t stop those dedicated to scraping your site, but if you introduce enough friction, they might stop viewing your site as a soft target and move on to easier pickings.
Using WordPress Plugins to Make Scraping Harder
If you want to close as many avenues as possible for scrapers to plagiarize your site, I suggest you give the WordPress plugin Smart Disable Right-Click on Website. When I gave it a whirl on my test website, it closed off most of the common avenues that scrapers use to copy content. It disallows the following:
Text Selection
The most common reason for selecting text on a website is to copy it. If your site deals with many code snippets (like mine on WP-Tweaks), it’s a better user experience to have a code block that easily allows the user to copy the code without forcing them to select it.
Similarly, if you need the user to be able to copy specific stuff, try and devise a system whereby they can easily copy it without forcing them to select text. Once you do that, you don’t pay much of a price for simply disallowing the text-selection function.
Right-Clicks
Most people will never need to right-click on a website. The most common use case is selecting text and copying it. But if text selection is disabled, right-clicking on a site will allow you to select all the text on the page. The Smart Disable Right-Click on Website plugin prevents this:
That’s a big vector for plagiarism gone!
Disabling Developer Tools
The plugin also prevents people from opening the developer tools by disabling the following two shortcuts:
- F12
- Ctrl + Shift + I
With it enabled, anyone who tries to use the above two shortcuts will see this:
This prevents people from accessing the source code directly. Speaking of source code.
Disabling Ctrl+U to “View Source”
Another way by which users can get their hands on the source code is by pressing “Ctrl+U”. If they try this with the plugin enabled, here’s what they’ll see:
So that’s yet another route blocked.
This Isn’t Foolproof, of Course
Not that any of this will stop a dedicated scraper. There are a few ways to get your HTML and your content if someone is willing to put in 10 minutes of thought. But plagiarizers are (almost by definition) a lazy bunch, and if you make them think a little bit or put up a small hurdle, chances are that they’ll turn around and look for something else.
If you think you’re being harmed or risk being harmed by people who copy your content, give this excellent WordPress plugin a try!
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!
Leave a Reply