Handy Regex Examples for Google Analytics

Regex, or Regular Expressions, can be a seriously powerful tool when creating custom segments, reports and filters. They allow you to go beyond the already powerful standard click and select functionality that comes as standard with Google Analytics.

There are plenty of resources available out there to help with this, but they can get quite technical. I hope to bring together the regex patterns that I find the most useful, and hopefully you will to.

If you have any use cases that you are having trouble with, please add them to the comments below and I will help you to find the best way to use regex to solve them.

I will be keeping this blog post up to date so please do bookmark and check back every once in a while.

Useful Regex

The Wildcard

If you want to see traffic to a specific section of your site using your url structure, then this regex is the one for you. This technique can be applied to anything where you want to match a word or phrase within a string (in this case a keyword in a url).

  • In your filter ensure that you select Page for the URL, Regex for the condition and (.*)2013(.*)

In this example, the filter will return any page that has “2013” in the url. You could use this for anything where you are trying to match a specific work within a string of letters and numbers.

The regex (.*) is essentially a wildcard that, when placed before and/or after a keyword allows you to say to Google Analytics, please give me everything where this word appears.

The OR Regex

There is a regex condition that allows you to find multiple keywords in the page title. Let’s say that you wanted to find all content that relates to Christmas. You could create a filter on a Page Title report that only pulls in data based on a certain set of words.

You do this by using a PIPE:

  • In your filter, ensure that you select Page Title, Regex as the condition and then your keywords – which should look like:
    • christmas|santa|noel|mince pie|holly|misteltoe

The filter will return any pages where the page title contains the keywords listed above. Remember not to add a PIPE at the end of your keyword list as this will result in returning all pages.

The give me something specific Regex

Let’s say that you wanted to filter only traffic from Facebook and Twitter in your All Traffic Report in Google Analytics. To do this, you need to use Regex.

Go to the All Traffic report and select Source as your dimension rather than Source/Medium, then click on Advanced Filter. In the filter, you need to add the following to ensure that you only include traffic from Facebook and Twitter:

  • facebook|twitter|^t.co$

By using the PIPE, you are ensuring that any of the keywords you added are included. The ^t.co$ ensures that you are only including traffic from Twitter’s redirecting url, t.co.

If you hadn’t encased the keyword with the ^ and $ symbols, you would have seen sources that contained t.co – for example pinterest.com and other domains.

  • ^ means starting here
  • $ means ending here

App referral traffic to your web site and how to track it

Measuring referrals from mobile and tablet apps to web sites is extremely difficult – actually, its practically impossible. Over the past few years you will have seen traffic from direct/typed/bookmarked sources increase steadily as app usage has increased. Unfortunately this is not because your web site has become a destination for your chosen content, but instead its because your analytics platforms are unable to attribute traffic to apps.

I am specifically finding traffic from social media apps, so Facebook and Twitter, the hardest to track down. There are techniques that I discuss below that can help you to track content that you post yourself, but unfortunately this doesn’t help with organic sharing.

While Facebook Insights for domains does give you some level of overall referrer information, it does not breakdown the traffic between desktop and mobile.

This post explains why your analytics package is currently unable to track this traffic and tries to find some solutions to help you to make sense of it all.

So how do analytics platforms track referrals?

Most analytics packages use header information contained in your user’s web browser to determine which site the user had been to previously to visiting your web site. This header information only appears if a user clicks on a link to your site. For example, if I am on web site A, click on a link to web site B, then the header information would show that the referrer was web site A.

Just so we are clear, if I am on web site C, then type web site D’s domain name into my web browser, then there would be no referrer and the traffic source would be Direct.

No referrer in the header

This latter example explains why measuring app traffic is extremely difficult; an app is not a web site, it is not viewed in a web browser and they do not contain header information when linking out to a web site.

There are two different techniques that apps can use to open up web content:

  1. Open the content in a native mobile or tablet browser – such as Safari, Chrome, Android Browser, Opera etc
  2. Open the content in an in-app version of the native browser

In both cases, because the app is opening up a new instance of a web browser, whether its the native browser app or the in-app version of the browser app, there is no referrer in the headers. So when your analytics package is seeing the user visiting your web site, it is seeing that there is no referral and  it will deem the traffic to have no source and therefore assign that referral as direct.

So how can we track app referral traffic?

There are a couple of things that we can do as content creators to get around this:

Google’s UTM Tracking: https://support.google.com/analytics/answer/1033867?hl=en

If you use Google Analytics, you can use the built in UTM tracker to track links that you post to other web sites or platforms. Quite a few URL shortening services and sharing plaforms such as Buffer, Bit.ly and Owl.ly allow you to add these dynamically so you don’t need to keep adding the tracking manually.

When a user visits your site using a UTM tracked link, this will override what is contained in the web browser’s header and attribute the source of the traffic to relevant keyword that you add to the tracking. You need to be very careful here as I have seen instances where these have been setup incorrectly and links posted to Facebook with Twitter as the source have been shared which makes the data inconsistent and unreliable.

One thing to note on this, is to only use this method for external, inbound site links. You should not use these links on internal links as they automatically start a new site visit – meaning that you will see a spike in visits whenever someone clicks on your tracked links.

Using a similar service for other analytics platforms:

Many other analytics platforms have a similar method for campaign tracking. While these have been primarily used for more traditional marketing campaign tracking, there is no reason why you cannot do use this same method for the sharing of content.

Limitations

This would only work for content that you share yourself. There is no way to enforce this for any organic shares – so when a user copies and pastes a url from your web sites into a social status update for example.

Also, as you cannot post content to a specific device or platform, you cannot differentiate easily between a social media’s desktop, mobile web or mobile app experience. So while you may get closer to tracking Facebook traffic, you will still not know the make up of desktop, mobile web or mobile app referrals.

 

Mobile Native Browsers vs In-App Browsers

I was surprised to find that Adobe Analytics (Adobe SiteCatalyst or Omniture, depending on how far back you go) doesn’t allow you to report on mobile browsers. Even though, I’ve been using Adobe SiteCatalyst for many years and I had always assumed that the ability was there. I am actually a little ashamed that I hadn’t noticed this before.

SiteCatalyst Device Type with Browser Breakdown
SiteCatalyst Device Type with Browser Breakdown

The main reason for me to look at this was that a colleague sent me a link about Embedded Mobile Browser Usage on http://www.lukew.com. While the blog post is more to do with how developers should be testing on in-app versions of mobile browsers as well as native browsers. This is mainly due to the in-app versions of Safari and the native Android browsers having less functionality and lower rendering powers. Luke explains that the in-app version of Safari for example does not include the Nero Javascript engine, certain Safari caches, or the “asynchronous” rendering mode. Meaning that web sites could appear to load more slowly, contain errors etc that would normally not be detected if testing on the native versions.

As Facebook and Twitter app usage continues to grow, its going to become more and more important to understand the differences between these different browser types and how users browse your sites when using these apps. In addition, when users click on your content from apps such as Facebook and Twitter, they appear as either Direct or Typed/Bookmarked depending on which analytics package you use and whether you are using any form of campaign tracking or not.

When I realised that Adobe Analytics does not track mobile device browsers, I turned to our implementation of Google Analytics. Fortunately, GA does track in-app browsers on iOS – so you can see data for Safari and Safari (in-app).

What I have found in my initial dive into this is that users browsing our sites using an in-app verison of the native browser consume less content (in terms of pages per visit), has a higher bounce rate and are on the site for about half the time than the native version.

In-app browsers vs native mobile browsers
In-app browsers vs native mobile browsers

What is going to be hard to work out is whether this is due to a behavioural or technical motivation.

If the site renders slowly, contains certain errors, or is not fully optimsied for the in-app experience, you could attribute some of the reduction of usage to this rationale.

In terms of behavioural, you need to consider what users are doing. If I am on my Facebook app on my phone, its probably while I am in the middle of doing something else. Potentially on the toilet, in a meeting (hopefully my boss is not reading this) or in transit therefore I have limited time to view.

Also, to move this point further, if I am in my Facebook app, I’m also interested in seeing what other content my friends have posted. So I might just read the article that a friend posted, then click back in the app so I go back to my feed and continue my Facebook journey.

One of the main things that I tell colleagues at my job is that Analytics is great at telling you what happened. Trying to find the reasons behind a person’s behaviour is not generally achievable. You can find commonalities between certain types of users, segment them and try to implement changes to convert users to behave in the same way. But in terms of understanding the drivers for users to act in a particlular way is not something that I believe Analytics will answer.

You need to be talking to your users and understanding why they are doing what they are doing – whether it be done through social media, email or surveys. You really need to have all the information to make decisions and I believe that Analytics is only one part of it in this case.

How Google’s secure search is hurting analytics

It has been happening since 2011 but ever since Google introduced the ability to use their search engine over a secure (SSL/https) connection, the lack of visibility of search keywords has been steadily increasing culminating with a predicted 70% of search keywords not being reported in September 2013, according to this blog post about Google Query Data Disappearing at an Alarming Rate on the RKG Blog.

This has increased from 43% only as far back as July 2013. So why is this happening and what impact is it having on analytics?

Google has been moving users to the SSL version of their secure search result pages to ensure the privacy of their users and ensuring that prying eyes cannot listen in on what users are searching for. They seem to have stepped up this tactic ever since the public has been made aware of various hacks and leaks to WikiLeaks.

The impact this is having on anlytics is that when a user searches for something over a secure connection, the keyword(s) that the user searched for is removed from the referral string. Meaning that a referral is detected from a search engine, but as the keyword has been removed would be logged as Not Available (or equivalent for the relevant analytics package).

As mentioned previously, it is assumed that up to 70% of search terms will be affected by this by the end of September 2013. The huge recent increase of these secure searches has come from Internet Explorer who from IE7 onwards now only supports the secure verison of Google’s search results.

Suddenly trying to understand the impact of any SEO improvements you are making to your site will not be able to be tracked effectively using your standard analytics packages.

One potential way of getting insight for SEO purposes could be to create an Advanced Segment that sets the referrer type to Search Engines and where the keyword is unavailable. Once you apply this segment, you can then look at top landing pages and understand the performance of your top landing pages. You could start to trend this over time and validate any changes this way.

This issue affects all analytics packages, including Google Analytics.

Suddenly, SEO just got a lot harder for everyone. Ironic really as Google is trying to ensure that they eliminate what they term as ‘Search Spam’ from their search results pages – see this video from Matt Cutts – but they are not allowing website owners to see how these changes affect them and how to best optimise their sites through data.

It feels like they are giving with one hand and taking with the other.

What is Cohort Analytics?

Cohort Analytics in this context is the measurement of how often a user returns to a website over a given period of time.  By understanding how well you retain your users, the better you will understand how best to monetise them – which is what we are all chasing in digital publishing.

This is not the same as the standard vanity stats that you may find in Google Analytics or Adobe’s SiteCatalyst for return visits or visitor retention, this will give you a more detailed understanding.

Also, if done correctly, you can use cohort analysis to measure not only general users to your website but also registered users, logged in users, users that purchase or convert to something.

Cohort analytics is quite a new concept for digital publishing.  In the past, CPMs have been high enough for publishers to only have to worry about unique users, visits and pages. But now, with the advent of Google Adsense and Facebook Ads, advertisers can now target audiences, so now publishers need to focus on other ways to measure and monetise audiences.

How to measure retention

Cohort Analytics is not something that is available out of the box with most standard web analytics tools.  Unfortunately it takes a bit of hacking to get it to work with Google Analytics – even then, it is quite limited as you can only measure over five units of time.  This will all become apparent shortly.

In addition, this explanation will only give you a basic overview for general tracking.

Google Analytics allows you to configure custom variables – of which there are five – and these segments can be persistent over a number of visits.

See Google’s documentation on custom variables here.

Custom variables should be set per time period that you would wish to track.  In this instance it would track user retention over a rolling 5 month period.

Here is some example code for month one using the first of five custom variables:

*** CODE ***

pageTracker._setCustomVar(
      1,                   // This custom var is set to slot #1 for the first month. 
      "Month",           // The top-level name for your online content categories.  Required parameter.
      "January 2011",      // Sets the value of "January 2011" to "Month" for this particular aricle.  Required parameter.
      1                    // Sets the scope to visitor level.   
 );
 pageTracker._trackPageview();

*** END CODE ***

Once this code is in your site, it will need to change each month. The two paramters that will need to change are the first and third variables where each will increment when month 2 begins.

*** CODE ***

pageTracker._setCustomVar(
      2,                   // This custom var is set to slot #2 for the second month. 
      "Month",           // The top-level name for your online content categories.  Required parameter.
      "February 2011",      // Sets the value of "February 2011" to "Month" for this particular month.  Required parameter.
      1                    // Sets the scope to visitor level.   
 );
pageTracker._trackPageview();

*** END CODE ***

Once this has been implemented correctly and has gathered the relevant correct data, you will see some hopefully nice results. In addition, if you are clever with your naming and strategy you will be able to measure much more than this.

The results

Cohort Analysis

Hopefully you will see something like the image above after a period of time.  What the above table shows is how many of the users return in the following months from their first visit.

The reason for the Month 1 statistics all being 100% is that all users in Month 1 are new. In Month 2 it shows how many users from Month 1 returned in Month 2. Month 3 highlights how many users returned to the site in Month 3.

By fully understanding how long your users keep coming back to your site, you can really start to focus on some new metrics.

You will be able to work out the Lifetime Value (LTV) of your users which would help you to work out how much you may want to spend on marketing. By understanding this, you can ensure that your marketing stays profitable.

You can also start to focus your development attention on lengthening the lifetime value of your users.