Regular Expressions (RegEx) in Google Search Console

on in SEO
Last modified on

Table of Contents

A few days ago, I noticed a Google Search Console regular expression query for user questions popping up on Twitter:

RegEx for questions in Google Search Console!
^who|^what|^where|^why|^when|^how|^is|^are|^does|^do|^can

Basically, the RegEx pattern above allows you to see user questions and number of clicks (CTR) to your website. Considering the most recent Google Core Update, it’s time your SEO efforts start focusing on frequently asked questions (FAQ), how-to articles and topic-specific questions.

Google Search Console supports regular expressions (RegEx) in filters. Let’s learn how to leverage RegEx to analyse traffic data. This article is to learn regular expressions that you can use in Google Search Console and follows the Re2 syntax. You will also learn how to use RegEx filters to gain insight into your organic search performance to feed into future strategies.

Regular Expressions (RegEx) in Google Search Console

What are the benefits of using RegEx in Google Search Console?

  • Time saving
  • Further insights than ever, such as misspellings
  • Complete understanding of search data to further business and sales
  • No need to link to Google Analytics any more

Getting Started with RegEx in Google Search Console

Google Search Console uses Re2 syntax and does not support all the regular expressions syntaxes that might be available in other programming languages.

Added in April 2021, Search Console’s RegEx filtering enables you to search for multiple elements in a query or page to return data that is of interest to you. This allows you to perform analysis as broad or as granular as you require.

Filtering by RegEx is available for Page and Query reports. To filter the Performance Report using regular expressions, click on New and select either Query or Page.

I am going to take “JavaScript” as an example of a keyword. It’s pretty hard to rank for this keyword, so I’ll aim for questions and long-tail keywords.

Add your regular expression and then filter your report.

Character Limits

Google Search Console imposes a character limit of 4096 characters. It is usually enough, as filters can be combined or run multiple times and then merged in a spreadsheet.

With regular expressions, you can make your pattern more condensed to save characters.

For example, this:

example.com/aaa|example.com/bbb

equals:

example.com/(aaa|bbb)

Match all Pages/Queries that contains a word

To filter pages or queries that contain a word, just wrap the word around .*.

This would match anything before and after your string. Here I match anything containing the word seo.

.*javascript.*
  • .* matches anything.

This is where I can see what users are searching for AND landing on my website. The next logical step would be to check if I have the requested/searched content and, if not, add it.

Match Specific Pages

To match specific pages, write your property along with a capture group () for the URIs.

^https://getbutterfly.com/(javascript|learn-javascript|javascript-tutorials)/$
  • () capture group to group elements together
  • | means OR
  • ^ starts with
  • $ ends with

Negative Filtering with RegEx

You can use the Doesn’t match RegEx with Custom (RegEx) filter.

Match Query / URL Length with RegEx

Match short patterns of less than 10 characters.

^[\w\W\s\S]{1,10}$

In my case, I got some useful one-word queries which I already started optimizing for.

  • [] matches range of characters
  • ^ starts with
  • $ ends with
  • \w matches ASCII letter, digit, or underscore. It is the same as [A-Za-z0-9_]\g;
  • \s matches whitespace;
  • \W matches anything that is not an ASCII letter, digit, or underscores;
  • \S matches anything that is not whitespace.
  • {1,10} repetitions of patterns from 1 to 10 times.

Find long-tail queries with Regular Expressions

The RegEx below would match any query longer than X characters (70 in this case, but you can go down until you are happy with the results). In my case, I got some weird results, not worth optimizing for. Still, going down to 50, I got some very useful results.

^[\w\W\s\S]{70,}$

Another solution is to count the number of white spaces to identify the number of words.

(\w+\s){7,}\w+
  • ^ starts with
  • $ ends with
  • [\w\W\s\S] any character
  • {70,} 70 times or more
  • (\w+\s) Any number of words between 1 and unlimited times followed by a space
  • {7,} 7 times or more
  • \w+ ending with a word

Find Very Long URLs

Use this regular expression to filter page URLs that are longer than 100 characters.

^[\w\W\s\S]{100,}$

Show a Specific URL Path

Sometimes, you just want to match a specific path.

  • /<category>/<subcategory>/<feature>
.*/blog/.*/javascript$

That could match

  • /blog/courses/javascript
  • /blog/how-tos/javascript

Ends With a Trailing Slash

Show pages that does contain (or not) the trailing slash at the end. As I am using WordPress, this is not necessary, as WordPress automatically appends a trailing slash at the end of a URL.

.*\/$

Partial match as default (using a “flags” example)

Partial match means that the regular expression can appear anywhere in the search string, which you’ll notice from the examples above.

The good news is that you can override this by using two different RE2 flags:

  • ^: Matches at the beginning
  • $: Matches at the end

Let’s get more inventive and look for “JavaScript” search queries at the beginning.

Without using “^”, you can see that “JavaScript” can show up anywhere in the string.

With using “^”, all strings have “JavaScript” at the beginning:

On the flip side, “$” will return the string of “JavaScript” at the end of search queries:

Show HTTP/HTTPS/Subdomains Variations

Although it is recommended to validate your site in Google Search Console both at a domain level and individual URL prefix, you might want a quick way to check your domain property for indexed subdomains or HTTP/HTTPS variations.

https?\:\/\/.*example\.com\/?$

This is a quick way to identify subdomains that you might not know are indexed, such as development subdomains or old, misconfigured DNS records.

  • https? matches http or https
  • \/?$ ends with trailing slash or not.

Compare Regular Expressions

You might want to compare pages or queries based on RegExes.

You can use the compare filter with regular expressions too.

As an example, you could compare two similar keywords, such as “JavaScript” and “TypeScript”. You will get a nice chart outlining the clicks, impressions, and ranking position for each of the keywords.

Understand User Intent

For quite some time, Google has understood the user’s search intent. Thus, filtering keywords similarly to what people search is good for producing or improving content related to those searches.  

Usually, these types of searches are divided into:

  • Informational
  • Navigational
  • Commercial
  • Transactional

You can paste the following expressions into GSC to find queries based on user intent.

Informational

Example pattern (see above how to use it in Google Search Console):

^(who|what|where|when|why|how)[" "]

And even more:

^(who|what|where|when|why|how|was|did|do|is|are|aren't|won't|does|if|can|could|should|would|won't|were|weren't|shouldn't|couldn't|cannot|can't|didn't|did not|does|doesn't|wouldn't)[" "]

Example: .*brand.*

Note: If your company is called “getButterfly”, replace “brand” with “getButterfly” to perform the search. 

Another potential use for this brand search could be to check if you are ranking for queries, including a competitor name. It’s popular to build pages targeting “versus” and “alternative to”, so this is an excellent expression to check those rankings.

Commercial

Example: .*(best|top|vs|review*).*

Transactional

Example: .*(buy|cheap|price|purchase|order).*

Or, if you are into real estate or property market, you can use:

.*(sell|seller|vendor|value|valuation|appraise|appraisal).*

Case Insensitive Queries

Want to make queries case-insensitive? Add (?i) at the start of the expression.

(?i)^(who|what|where|when|why|how)[" "]

Match Branded Terms

Often, people searches have spelling mistakes in them. You can properly evaluate brand searches with regular expressions.

Let’s make an example with LinkedIn possible misspellings:

  • lnkedin, linkeidn, linkden, linedin, linkein, likedin, linkin, linkedin, linkd, amazon linkedn

You could start with a long RegEx:

.*lnked*in.*|linke*idn.*|linkd*en.*|lined*in.*|linke*in.*|liked*in.*|link*in.*|linked*in.*|.*linkedn.*|.*linkd.*

Or be more specific:

.*l(i|n){1,2}(k|e).*n.*

Multiple patterns can also do:

.*l.*(i|n).*(n|k).*(e|d|i).*n.*

Testing this on my non-LinkedIn website, got some interesting user queries. I might write some articles targeting those keywords:

Compare Brand VS Non-Brand Traffic


Check for Potential Content Injections


Content injection is a hack that injects webpages into your website, including specific keywords. Here is an idea how you can check for common ones on your site.

Use this RegEx in the Page’s regular expression to check if it matches.

.*viagra.*|.*cialis.*|.*levitra.*|.*drugs.*|.*porn.*|.*www.*www.*

It shows zero matches for me :)

Check WordPress Admin URLs

Pretty straight forward, check WordPress admin pages that seemed indexed.

.*wp-.*

Or if you need to escape the dash:

.*wp\-.*

How to use Custom (RegEx) filter in Google Search Console (use cases)

Here is the full list of regular expressions supported by Google. To debug your RegEx, we recommend you use regular expressions tools such as regex101 to make sure only the URLs or queries you want data for are included.

Match a list of queries or URLs

Do you want to filter your search queries for any that contain keyword1 OR keyword2 OR keyword3? Use the pipe character “|” as an “or” function in your RegEx filter:

Example: keyword1|keyword2|keyword3

If you want to do the same thing but match only those search terms exactly, add ^ and $ to indicate the beginning and end of each string, respectively.

Example: ^keyword1$|^keyword2$|^keyword3$

Find longtail keyword questions with RegEx

Example: ([^" "]*\s){7,}?

This expression will show you all the queries with 8 or more words. 

To get information about shorter queries, change the number “7”. For example, to get 5 words, change “7” into a “4”. Basically, put the number of words you want to find minus 1 into the expression.

Find pages ending in a specific slug

Example: word$

In this case, replace “word” with the keyword you are looking for in the URLs. Then you only need to finish the expression with a dollar sign ($).

Find “after purchase” queries

RegEx can help you find what post-purchase queries your site is currently ranking. This is useful for knowing potential problems with your products and creating content around these searches. 

Example:

\b(clean|broken|wash off|shattered|polish|problem|treat|doesn't work|replace|doesn't start|scratch|repair|manual|fix|protect|renew|coverage|warranty)[" "]

Filter Product Model Names

Another use case for RegEx is to filter organic searches for product model names. This could be useful to see the search demand for specific models. For example, let’s say you own an online electronic store, and you’re trying to see how well your site is doing capturing searches for Samsung TV models. Here are some of the real model numbers of Samsung TVs:

Q80A
Q70A
Q60A
Q60T
Q70T
Q80T

This is a challenge because you can’t just filter for “Q” because you’re going to pull in anything with the letter Q in it. Also, the numbers in the middle of the model number are all different. Prior to RegEx, there was no way to filter for these model numbers in Google Search Console. But with RegEx, you can create a rule like this:

(?i)Q[0-9]0(A|T)

This short Regex would capture all the model numbers. Let’s break down how this works:

  • (?i): This makes the match case-insensitive, meaning the RegEx would match “q90T”, “Q90t”, or “q90t”
  • Q: This simply matches the “Q” proceeding the start of the model number.
  • [0-9]0: The [0-9] would match any number between 0-9. The “0” outside the bracket is there because all the model numbers end in zero. So this will match anything from “00” to “90” in the model number.
  • (A|T): This would match any model numbers ending A or T.

Filter URLs by File Extension

You can also filter URLs with RegEx. This could be helpful in many ways. For example, you may want to see organic impressions, clicks, and queries for any non-HTML assets you have ranking in Google Search such as: .docx, .pdf, .rtf, .xls. This is especially useful since non-HTML assets cannot have the Universal Analytics code loaded and would typically not show up in your Google Analytics report.

Filtering for this is very easy to do with the following RegEx:

\.docx|\.pdf|\.rtf|\.xls

Note that the backslash “\” escapes certain special characters. The dot “.” is a protected character set used by RegEx to match any single character, so the backslash escapes the function to make it a string.

Find Longtail Keywords and Questions

We’re going to use this regular expression:

([^" "]*\s){7,}?

Type this into “query” “Custom (RegEx)” and it will show you all the keyword with 8 or more words in them.

If you want keywords with 10 words or more change the 7 to a 9.

If you want to find keywords with 4 words or more, change the 7 to 3.

Additional RegExes

Some additional regular expressions for Google Search Console:\

# Matches URL slug
^[foo]+(?:-[bar]+)*$

# All urls within /page
(http|https):\/\/www.example.com\/page\/.*

# All urls between a certain slug and ending
(http|https):\/\/www.example.com\/slug\/[^\/]+\/page

# Matches all queries containing a specific term
\b(\w*foo\w*)\b\b(\w*foo\sbar\w*)\b^hello\sworld$

# Matches all queries containing "blue shoe" or "blue shoes"
(\W|^)blue\s{0,3}shoe(s){0,1}(\W|$)

# Matches all queries that contain "getButterfly or "getButterfly SEO"
^.*(getbutterfly|getbutterfly seo).*$

# Match Word or Phrase in a List
(?i)(\W|^)(foo|bar|foo\sbar)(\W|$)

Related Posts