Donald Trump Analytics Post

Donald Trump is in Your Google Analytics

In Analytics, The Digital Marketing Blog by Jon Hibbitt8 Comments

There’s a new type of Google Analytics spam in town and its eroding the accuracy of your GA data. It’s being referred to as Language Spam.

Take a look at your Language Report in Google Analytics > Geo >Language. It should look like this:

You can see that it’s nice and logical, with the different languages broken out and easy to analyse.

Now there’s a new type of spam / fake traffic which ruins our language analysis. It looks like this:

It’s yet another threat to the accuracy of your GA data.

What you might also be interested to know is that the example above is taken from a site that hasn’t installed Google Analytics. This means that data is being pushed into Google Analytics without the code being present on any pages.

Why is this happening?

Spammers wants to acquire traffic. The tactics used in this particular campaign are an interesting blend of old school marketing psychology.

The tactics look like this:

  1. Leave an entry in GA reports to a sub-domain that appears to be owned by Google.
  2. Make people curious: Secret
  3. Imply scarecity: Enter only with this ticket URL
  4. Add a hot topic: Vote for Trump!

Their goals is then to bake the ingredients into a data payload and automate delivery to as many website owners’ Google Analytics Accounts as possible; Get people to visit the faked Google domain, obtain PII, deliver viruses, generate ad revenue etc.

The Language Spam Solution

Thankfully this threat can be taken down swiftly and without too much hassle. Make sure you’re logged into Google Analytics and we’ll walk you through the next steps.

Step 1 – Create a Custom Segment

Create a Custom Segment to test your filter pattern and preview your historic data. Best practice is to always configure a Custom Segment in your Master View where you can safely test the filter logic and see how different filter settings change your data.

  1. Click +Add Segment

2. Set Up the Custom Segment Pattern

  1. Use clear descriptive labelling
  2. Set ‘Language’ to ‘does not contain’ and the pattern as a full stop: ‘.’ Language codes never contain a full stop and this setting excludes any entry which include one.
  3. Click Preview to see the effect
  4. Use the custom segment for any historic reporting to view your data without Language Spam.

Here’s the data with the Custom Segment applied, as you’ll see, we’ve removed 25.64% of Language Spam using this segment:

If the custom segment works as expected save it. I know what you’re thinking, what about the (not set) bucket? We’ll look into (not set) in another blog post.

Step 2 – Add the Filter to your Test View

It’s best practice to run the filter on your Test View for a period before applying it to the master view at a later stage.

  1. Select your Test View
  2. Click Admin > Filters

3. Select + Add Filter

4. Give the filter a descriptive name, e.g. ‘Exclude Language Spam Referrers’
5. Select ‘Custom’ in the Filter Type
6. Click the ‘Exclude’ radio button
7. Add ‘\.’ to the Filter Field (that’s an escaped full stop)
8. Click ‘Save’

  1. Leave the filter in place for a few days to allow sufficient traffic to collect
  2. Compare the Test View Language report entries with your Master View report entries. You should be able to see differences in data between your reports.

Step 3 – Apply the Language Spam Filter to your Master View

Once you’ve collected enough data in your Test View and are satisfied that there’s no longer any Language Spam collected, apply the Filter to your Master View. Filters are not retrospective, so if you need to see historic data without any Language Spam, you’ll need to apply the Custom Segment you created in Steps 1 & 2.

Next Steps

You’re not done. There’s other fake traffic creeping into your reports like Ghost Spam, Spam Referrers, Event Spam, Campaign Spam… Don’t worry you’ll be clearing out more fake traffic in a follow up to this post when we take a look at Ghost Spam – wooooOOOOooo.

If you liked this post, you might find our Guide to Setting Up Google Analytics Like a Pro useful. You can download that for free by filling out the form below:

Download Setting Up Google Analytics Guide:

Resources

There’s a lot of resources talking about GA spam, but Mike Sullivan’s guide provides solid trusted advice and is highly recommended. Get coffee and chocolate sorted before you click this link!

http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/

http://www.analyticsedge.com/2016/11/heres-a-secret-%C9%A2oogle-com-is-not-google-com/

Google – Custom Segments and Include / Exclude Filters

https://support.google.com/analytics/answer/3124493?hl=en

https://support.google.com/analytics/answer/1034832?hl=en

Related Posts Plugin for WordPress, Blogger...

Comments

  1. Author

    You are welcome. We’re also seeing vitaly links in Language Reports which will require an additional exclude filter to block. Update to come!

  2. Very helpful post. Spam referral traffic has become a real bane for providing accurate reporting and I suspect that some SEO’s may actually claim that the extra traffic from the spam bots is down to their SEO efforts.

  3. Author

    Hi Stuart – yes, this is a big problem client side too. If you remove the non-customer traffic, annual reports dip. It can reduce the appetite for addressing the issue which isn’t helpful from an accuracy standpoint. We often find there’s a small % of referral traffic which is from organic search engines. You’ll find this while routinely checking reports for Spam Referrers. Dial up the referrals report in GA, set a wide data range and type ‘search’ into the table filter. You should get a list of organic search engines you can add to the organic search default channel group. So SEO can benefit from reviewing fake traffic!

  4. Author

    Hi Robert, it’s interesting to note that using a Property other than UA-XXXXXXX-1 leads to drastic reduction in fake traffic. I saw this first hand recently with a client who uses UA-XXXXXXXX-12. The bot programmers know this is the most common GATC Property… Good luck with your implementation.

  5. Hi John,

    Thanks for the post. I’ve seen other suggested regular expressions to get rid of language spam but none as nice and simple and yours. I can’t quite understand how it works. It says ‘literally any single character’, how can it avoid excluding all languages please?

    Thanks, A.

  6. Author

    Hi Alexandra,
    Most of the language spam sent to GA contains a dot ‘.’ but, not all. So all the regex is matching is any language dimension with a dot in it. If you examine the ISO-639 & ISO 3166 formats used by Google for the Language settings in GA, you’ll see there are no dots ‘.’ Browsers use the ISO format to send the language settings data to GA. As you’ve realised, there are mechanisms to send non-compliant ISO 639 / ISO 3166 codes into GA. Excluding the ‘.’ is a catch all pattern match. I think what we’d like to see is google applying some type of validation for the Language Setting dimension to ensure only strings which comply with the ISO settings can get through. Any junk could be attributed to (not set). There are some good resources to test your regex. In the office we like http://rubular.com/. And finally, I can’t take credit for the clever regex. The original post by Mike Sullivan is here:

Leave a Comment