The risks and rewards of data scraping for SEO – A Presentation

In SEO, The Digital Marketing Blog by Jason0 Comments

I was invited to join a panel discussion and presentation a few days ago on data scraping by our esteemed legal friends at DMH Stallard, “the business people who happen to be lawyers”.

I joined speakers from Sentor who introduced the audience to their data scraping monitoring service called Assassin as well as their customer www.yell.com who explained how they have overcome and now manage the data scraping issues they face as a business.

I was uncomfortably placed on the side of the “scrapers” whereas Sentor and Yell were defending the “scrapees” with Frank Jennings adjudicating from DMH Stallard. It’s clear there are strong arguments for and against scraping……

Before we begin, it’s worth bearing in mind that ultimately, if data and content is accessible online, anyone / machine can manually / automatically copy and create a new database. And although this practice would be illegal in the UK, it is a known risk to all data publishers.

Before dealing with the SEO issues relating to data scraping, it’s worth understanding why scrapers are in business. In reality many businesses which operate online use scrapers to an extent, whether they be comparison sites checking prices, publishers acquiring cheap content, property portals aggregating listings from smaller specialist sites, online insurers scraping prices to score better on comparisons for strategic products or B2B portals building databases and even the search engines like Google can be considered to be scrapers. Now, DMH Stallard would be best placed to comment on the legality of scraping practices but I was rapidly forming the impression that it’s not so bad after all, everyone’s apparently at it!!! On the other hand for businesses like Yell.com whose whole business model, brand and value is based on the content of its website, scraping of their prime business asset is considered theft and comprehensively protected against on an industrial scale.

So how does this impact on SEO? As we know, search engines like original, high quality content placed on authoritative websites where the reward for investing in producing your own keyword rich content is lots of relevant traffic from your favourite search engines. My colleague Helen Trendell explains this in her post on Integrated Search Marketing.

But Google doesn’t like duplicate content and if the scrapers are depositing your content all over the web, the “scrapee” is put at potential disadvantage. So what’s the advice to “scrapees” or websites who are subject to data scraping attacks? I summed up in my presentation as follows:

Recognise there is a SEO Risk / Opportunity for any website which contains content

Decide on your approach and either

Report them to Google & their ISP and / or take legal action but this will cost time and money

OR Deal with it and make it difficult for the Scraper by for example:

  • Monitor your web analytics for evidence of scraping “bad bots”
  • Ensure your website T&Cs deal with scraping content
  • Consider IP lock out which restricts any IP to maximum access per hour before blocking the IP or requiring a “captcha”
  • Use “captcha” forms instead of allowing extractable email addresses
  • Block the IP address of all of your known competitors
  • Generally scraping is done via patterns on the pages. If we use random page generators then scraping becomes difficult.
  • Use a Flash layer to display the final data so that it cannot be scraped  whilst making sure you provide for SEO in the design
  • OR Make it work for you In terms of how to gain SEO benefits from scraper activity on your site, the primary opportunity is to use the scrapers as a backlink distribution channel, so if you ensure that your content contains url references back to your site, when your content is scraped and placed elsewhere on the web; the scraping “bot” will have created a nice backlink for you on their own website. Some further detail can be found by listening to the advice of Scott Allen or by speaking to www.sitevisibility.com about internet marketing strategies

    OR All of the above

    Constantly monitor the situation and develop / refine your approach as part of your online strategy

    This manual approach would work for most smaller business though when the data is business critical and /or of strategic value, as it is for customers of Assassin, you can outsource the monitoring and blocking of scraping IP addresses to a managed service provider.

    Looking into the future, it seems this is a battle against one of the relentless forces of the internet; free content.  So whilst scraper mitigation as discussed in the post might buy you time to develop your “fremium” business model, it’s not going to protect you in the longer term against market forces and the insatiable demand for free content, whatever its source.

    Related Posts Plugin for WordPress, Blogger...

    Leave a Comment