Google Search Console is a service offered by Google that helps in monitoring, maintaining, and troubleshooting a site’s presence in Google Search results. The users can view Google Search traffic data for the site and allow us to ask questions like:
- How often does your site appear in Google Search?
- Which search queries show your site on the search result page?
- How often searchers click through to your site for those queries?
- and more
In the latest update, Google Search Console has enabled support for Regular Expressions as a new feature that is designed to improve the efficiency of analyzing data in reports by using Regular expressions in the filters for Queries and Page.
Regular Expressions
A Regular Expression (or regex, regexp or even rational expression) in all its spirit – a sequence of characters that tries to define a search pattern.
These patterns are usually used by string searching algorithms to find strings or even for the validation of data.
Some of the common use cases are:
- Structural inclusion of the strings
- Extraction of substrings from structured strings
Regex for GSC
Earlier, Google Search Console did not support complex cases like queries that contained one of the several optional strings. String-containing, string-free, and exact match to the string were only the three patterns according to which the users could filter Queries and Page URLs.
For a new Regular Expression filter to work, the user needs to start by creating a Page filter or a Query filter and select Custom(regex) from the drop-down menu.
Example use case
Let’s take one of our client Dominos as an example to understand what a regex can do for us and how to make them.
A user searches for dominos related queries in various ways such as “dominos near meâ€, “dominos onlineâ€, “dominos pizza†etc. Yet, sometimes users can type in a different spelling of the brand than what Dominos intended. A few examples of that can be seen in the following image.
Even though such miss-typed queries are extremely common, we would not be able to humanely internalize the possible permutations of such spellings. At the same time, we as end-users of the Search Console do not want to drop the Queries that we don’t know are “Branded Queries” (permutations of Spelling of the Brand). These Queries may contain critical data for an Analytical Process.
Yet, for such permutations (“dominoz†/ “dominojâ€) it is very difficult to write a Universal Regular Expression. Especially for someone who is not a regular at writing these. To make the End User’s life slightly easy, there are many tools available online to test and generate different Regular Expressions.
Most of these online tools start the process with you giving the tool a list of possible permutations of spellings and selection of those permutations becomes critical.
You can however start by exporting the table with Query data either via the front-end of The Search Console and export as CSV or through the Google Search Console API.
The option to export via the front end will limit the data to contain only the first 1000 Queries but this could be treated as a starting point.
Following these steps should help you with the process:
1. Open Search Console and Select your property
2. Apply a filter – Queries containing “dom†(First few letters of your brand)
3. Click on Export and Select “Download CSVâ€
4. Open that CSV in an Excel spreadsheet
5. You will need to extract the words containing a specific set of characters(in our case “domâ€) from the list of queries by using the following formula in Excel.
=TRIM(LEFT(SUBSTITUTE(MID(text, FIND(“domâ€, text), LEN(text)),” “,REPT(” “,LEN(text))),LEN(text)))
text: The text string or cell value that you want to extract words from. (A2 in this case)
char: The character or text that you want to extract, “dom†for our case and the first few characters of your brand name in your case
6. Extract unique rows from the resulting column to finally get the list of all permutations of the brand words.
7. Now, enter the list of words in the regex generator.
8. Finally, the generated regex can be used to make a Custom(regex) filter in Google Search Console.
Similarly, if the user wants to apply the filter for the Queries that are questions. The Regex to filter the question query is: ^(who|what|where|why|how)[†“]
Google Search Console matches partially by default. This means a Regular Expression can match anywhere in the target string.
If you want to search the Queries/Questions that begin or end with a trigger word such as “who/what/where/why/howâ€, place the characters ^ or $ that are used to find matches at the beginning or end of the regex string respectively.
Few examples of regular expressions:
- OR operator:
-
-
- x(y|z) – matches a string that has x followed by y or z (does not capture x)
- x[yz] – matches a string that has x followed by y or z, (captures x)
-
- Bracket expressions? ?[]:
-
- [a-fA-F0-9] – case-insensitively matches a string that represents a single hexadecimal digit
- [0-9]{1,3}% – matches a string that has one/two/three more characters from 0 to 9 before a % sign
- [^a-zA-Z] – matches a string that does not contain a letter from a to z or from A to Z. ^ is used here as a negation of the expression
Regular Expressions will not return a match in case of invalid syntax.
For assistance in creating the Regular Expression filters, Google suggests live testing tools that can be found here.
Conclusion
To conclude, this is a great update for SEO executives, hobbyists, and developers who want to look at the data or analyze the data from Google Search Console in a specific way.
Using such a mechanism, we now can analyze and bifurcate say a Brand Query from the Non-Brand Query or Differentiate the metrics of a Query that is in a question like a format from the ones that are not. At a URL level, regex can easily help you identify clusters of URLs and analyze them separately.
Creating regex is quite easy but seeing how it can become confusing at different turns, we suggest taking things a little slow and understanding the documentation made available by Google.