Recently, Google introduced a new “doesn’t match regex” filter on the search console which I immediately found a use for! A few months back, I wrote about how to practice content hygiene on your website. One of the biggest difficulties I had with creating a list of pages for my website WP-Tweaks, was that the search console lists not just full posts and pages, but also archives, in-line anchors, and nofollow links. This clutters up the display and doesn’t easily allow us to see which actual pages are performing poorly.
For example, if you sort your pages in ascending order of clicks or impressions, you might see something like this:
Since archive pages and inside links are visited very rarely, the majority of your results will be like the ones above. Previously, I had no choice but to export the entire list into a spreadsheet, and then painstakingly clean out the list using filters and exclusion pattern matching. The whole process was very annoying. Not to mention it got out of date quickly as new data became available in the search console, and I had to repeat the process all over again.
Google Added Negative Regex Matching
While Google allowed you to create Regex expressions that match the criteria you want, it critically lacked negative matching. Meaning that you couldn’t create a list whose members didn’t include some keywords. For our purposes, this was useless. I couldn’t create a pattern to match every valid page. I only knew which patterns I didn’t want. So for my purposes, the in-built Google tools weren’t helpful at all.
However, recently Google added this feature. We can now create pattern matching regexes and instruct Google to list items that don’t match them. Here’s a screenshot:
With this, I could create a clean list of poorly performing pages in just a few seconds, all updated with the latest Google Search Console data. It’s really streamlined the way I practice my content hygiene.
Here’s how it works.
Creating the Negative Regex
Regex expressions can be notoriously difficult to create. They also look super ugly, and often have no resemblance to the pattern you’re trying to match. However, it’s also very powerful. I always end up re-learning Regex each time I use it, only to forget it a little while later.
For my purposes, here’s the Regex I use in Google Search Console to clean my list of useless pages:
Let’s break this down a bit.
This regex matches all URLs that have ONE OF the following inside them:
Looking at the regex, you see I’ve enclosed each pattern in brackets (), separated by “|” – the OR operator. Once you understand it, the rule is quite clear. One thing to keep in mind is that if you want to match special characters like dots (.), carets (^), or backslashes (\), you need to escape them with a backslash (\). So if you want to match a dot in your URL, you need to use (\.) instead.
Modify the above Regex with your own pattern, and paste it into the search console after selecting the “Doesn’t match regex” dropdown item from the list. Of course, if you do want to match patterns, then select that instead!
Click “Apply”, and you’re done! The search console should now show only those pages that don’t match the pattern in your Regex. Now you can easily check which pages are underperforming, and take steps to correct the situation. Happy cleaning!
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!