In this post, I’m going to introduce you to eight powerful RegEx concepts that you can use in Google Analytics to create better advanced segments and filters. These were chosen based on things that we at SwellPath use most to get actionable data that drives decisions. At the end of this, you should be ready to harness the power of RegEx to get that high-value site data from your GA profiles.
“Everybody stand back! I know regular expressions!”
Last week, I had the pleasure join the Portland Google Analytics User Group as a “Subject Matter Expert” teaching people about using advanced segments and filters in GA. It was a great event, with a great crowd from absolute beginners to experienced users. I brought up using regular expressions at the event and wanted to provide a post for those who are still getting their feet wet.
Using RegEx in your advanced filters and segments gives you incredible power over your data. Using the basic method, you could set up an advanced filter to look at visitors landing on blog posts from January, February, and March.
Using RegEx, you can make that same filter using this.
Just from that super watered-down example, you can see that using RegEx has the potential to greatly simplify and power up your GA segments and filters. So without further ado, here are eight RegEx concepts that will revolutionize how you use advanced filters and segments.
#1. Pipe Dreams
One of the simplest to understand concepts in RegEx is the pipe. It looks like this: |
It’s essentially the word “or” and lets you tell Google Analytics that you want results matching this or that.
Better example: Mario|Luigi
#2. Collect Carrot, ???, Profit
Don’t get the joke in the title? Don’t worry. It’s not just you. The best kind of jokes require an explanation (#ProbablyNotAccurate).
In RegEx, we can specify exactly where we want something to appear or make certain things completely optional. By adding a caret to the beginning of a regular expression, you’re specifying that you only want results that start with that. So, using the following in the keywords report would only return searches that start with “swell”: ^swell
You can also do the other half of that: specify that you want your string to end with something. The following would return any keywords ending with “rad”: rad$
- SwellPath is rad
- SEO for Lauren Conrad
You can also use them both to specify you want an exact match: ^swellpath$
The last concept here is the question mark. This makes the preceding character optional. Very useful in spelling variations that have extra letters. Using the following, you could get both an exact match on “colors” or on “colours”: ^colou?rs$
Sure beats this: ^colors$|^colours$
So in this section, we covered the caret “^”, the question mark “?”, and the dollar sign “$”.
GET IT?!
#3. Father, Mother, Sister, Brother
You can also use RegEx to define a “family” of items. You can combine parentheses with the pipe to use the “or” functionality within a larger expression. For example, the following works fine if it’s all you’re looking for: father|mother|sister|brother
But what if you want something more specific but still want to allow the freedom to catch all the family member variables? You can use parenthesis.
Visiting my (father|mother|sister|brother)
#4. To Infinity and beyond!
RegEx also lets you specify repetition. This is useful if you want to account for bad spelling: scho+l. The + sign specifies that the preceding character can occur one or more times. You can also specify that a character can occur zero times (optional), one time, or more, using the asterisk. To find an excitable text messager, just use OMF*G. You’ll get “OMG”, “OMFG”, or “OMFFFFFFFFFFG”.
Photo credit: iSite Design
#5. Surprise Me
Having a “wildcard” character is always useful. The period, dot, or full stop is your wildcard in RegEx and stands for “anything”. Using it in an expression allows anything to occur in a given position. So, d.g would round up “dog”, “dig”, “dug”, or “d4g”. You can also pair the period with the asterisk. Using .* means you’ll take anything. Anything at all. Not that useful within Google Analytics, but it’s handy in other applications.
#6. Keeping it Classy
One of my favorite concepts in RegEx is character classes. Using square brackets, we can define a whole class of things that apply to the space of one character. What does that mean? Say you want to find pages in Google Analytics that start with a number.^0|^1|^2|^3|^4|^5|^6|^7|^8|^9 is pretty lame. Using a character class, we can use ^[0-9]. The hyphen there means “through” and can be used for number or letter ranges. More commonly, you’ll specify a few letters using a character class: [kc]haos
#7. History Repeats
We covered some repetition with the addition symbol and the asterisk. The upward limit on both of those is infinity. There are a lot of situations when you need something more controlled. Using curly braces allows you to use a set number of repetitions. You can set lower and upper limits using a first and a second number between the braces. A{2,4}means you’re looking for “AA”, “AAA”, “AAAA”. You can also omit the second number to specify an exact number of repetitions. Pretty co{2}l.
#8. Don’t Even Think About It
You’ve probably noticed by now that there are a good number of characters that have a special meaning in RegEX. To use them literally, you need to tell them to not be special anymore. Characters that have special meaning are square bracket [, the backslash , the caret ^, the dollar sign $, the period ., the pipe|, the question mark ?, the asterisk *, the plus sign +, the round bracket (.
You can precede any of these special characters with a backslash to cancel out their special meaning. If you’re looking for pages in Google Analytics that contain a query string parameter, use /? in your expression.
Quick Reference and Putting it All Together
- This or that
This|that - Start with, optional, end with
^start, options?, End$ - Families or groups
(mother|father|sister|brother|direwolf) - One or more, zero or more
The+, OMF*G - Wildcard
d.g d.g - Character Classes
We’re number [1-9] - Controlled repetition
503 806 [0-9]{4} - Escape Special Characters
I need about $3.50
Just to provide an example of how powerful this could be, I put together a RegEx to capture all the variations of my name plus the term “SEO”.
^Mi(ke|ch[ae]{2}l).?arne+s[eo]n.*s.?e.?o.?$
The following list represents a fraction of what this RegEx can grab:
- Mike Arnesen SEO
- Mike Arnesen S.E.O.
- Michael Arnesen SEO
- Mike Arnesen SEO.
- Mike-Arnesen SEO.
- Mike-Arnesen SEO
- Mike-Arnesen SE.O
- Mike-Arnesen S.E.O.
- Mike-Arnesen S.E.O
- Mike Arneson SEO.
- Mike Arneson SEO
- Mike Arneson SE.O
- Mike Arneson S.E.O.
- Mike Arneson S.E.O
- Mike Arnesen sucks at SEO
- Mike Arnesen SE.O
- Mike Arnesen S.E.O
- Mike Arnesen loves SEO
- Mike Arnesen knows SEO
- Mike Arnesen is terrible at SEO
- Mike Arnesen is speaking at SMX East about Google Authorship and SEO
- Mike Arnesen is great at SEO
- Mike Arnesen has never actually done SEO
- Micheal-Arnesen SEO.
- Micheal-Arnesen SEO
- Micheal-Arnesen SE.O
- Micheal-Arnesen S.E.O.
- Micheal-Arnesen S.E.O
- Micheal-Arneesen S.E.O
- Micheal-Arneeesen SEO
- Micheal-Arneeesen SE.O
- Micheal-Arneeesen S.E.O.
- Micheal-Arneeeeesen SEO.
- Michael-Arnesen SEO.
- Michael-Arnesen SEO
- Michael-Arnesen SE.O
- Michael-Arnesen S.E.O.
- Michael-Arnesen S.E.O
- Michael Arneson SEO.
- Michael Arneson SEO
- Michael Arneson SE.O
- Michael Arneson S.E.O.
- Michael Arneson S.E.O
- Michael Arnesen SEO.
- Michael Arnesen SE.O
- Michael Arnesen S.E.O.
- Michael Arnesen S.E.O
If reading this got you interested in learning more about RegEx, there is an excellent site that will teach you everything you’ll ever need to know about RegEx. Best of luck in digging deeper into your data. If you found this post useful, let me know if the comments. Happy RegExing!