The Top Five Most Common Onsite Content Duplication & Canonicalization Issues

In SEO, The Digital Marketing Blog by Kelvin7 Comments

Sometimes the SEO industry (myself included) gets caught up in what you can do offsite to improve your rankings. It’s tempting to spend a few more hours fine tuning that killer piece of link bait so it goes popular on Digg or contemplating the moral implications of buying links, when actually it’s an onsite factor hindering your rankings.


Creative Commons License credit: Hanataro If there is more than one of you how do we know who is the ‘real deal’

We’ve found with our clients one of the most damaging onsite problems that often seems to get over looked is onsite duplication which is most often caused by canonicalization issues.

It’s a bit of a mouthful and initially seems complicated but actually canonicalization isn’t too hard to get your head around. If you can type into an address bar two different URLs and see the same content there’s a chance the search engines could be doing the same.

This causes problems for the search engines spiders, which page is the ‘right’ page? And if they have the choice between a few of your pages non of which they are certain is the right one or your competitor who has only one laser targeted page who are they going to choose?

The engines have tried to solve canonicalization algorithmically, but it is such barrier to search performance it’s one of the first things I check for when I look at the new site.

Here though are five of the most common canonicalization issues and the simplest ways to solve them.

Non WWW – do both http://widgets.com and http://www.widgets.com resolve independently?

I wish I had a quid every time I saw this problem, it’s mainly a problem when developers are setting up your server. Fortunately it’s easily solved with a permanent redirect to your preferred variation. You can specify your favoured option in Google’s webmaster tools, however that ignores the other search engines and you might still split your link equity across essentially two different website.

Index.html – whether your site is set up in html. Asp or php there’s a really chance both www.domain.com & www.domain.com/index.html are resolving separately. Again chose your favourite (I prefer the root for simplicities sake) and [tag]301 redirect[/tag] to it. Also check all your internal links to make sure they are linking to the preferred location. The redirect would make sure they ended up in the right place but it’s just good housekeeping.

Folder Root – just as leaving the root or index file can be a problem on the home page you can have the same problem with all your internal folders. It any duplication here would probably be caused by inconsistencies in your internal linking. So first make sure your always linking in the same way choose widgets.com/blue/index.html or widgets.com/blue/ and stick with it. And set up a permanent redirect to pick up any stray links you might have received from other websites.

Trailing Slashes – Google seem to be better able to filter duplication caused by trailing slashes than some of the other kinds we’ve talked about. But if both widgets.com/blue & widgets.com/blue/ are resolving I would try to rectify it just to be on the safe side.

https:// – if you’re an ecommerce store or capturing customer data there’s a high chance you could have a secure version of you website hosted on https if this is being spidered you could potentially have two separate versions confusing the search engines. There’s no reason the https should be in the index so block it to the search engines.

[tag]Canonicalization[/tag] is an issue the search engines are doing their best to solve but from every example I’ve seen sites who solve onsite duplication issues always benefit from getting there house in order.

Related Posts Plugin for WordPress, Blogger...

Comments

  1. This is a great article… these are all common mistakes the average website owner makes (or the person setting up their site). You’ve done a great job of covering them and resolving them!

  2. This is a great article… these are all common mistakes the average website owner makes (or the person setting up their site). You’ve done a great job of covering them and resolving them!

  3. Pretty good information and I was thinking about https versions. Although I have got the same thing to do on other blogs but it is a nice findings.

  4. Pretty good information and I was thinking about https versions. Although I have got the same thing to do on other blogs but it is a nice findings.

  5. Top tips.
    The first one happens so much and it’s so simple.
    Worse is when the version missing www doesn’t even work, so if you just type the url into the browser bar and hit enter the page doesn’t load (though that’s a usability point rather than an seo one…)

  6. Top tips.
    The first one happens so much and it’s so simple.
    Worse is when the version missing www doesn’t even work, so if you just type the url into the browser bar and hit enter the page doesn’t load (though that’s a usability point rather than an seo one…)

  7. Pingback: The Top Five Most Common Onsite Content Duplication … | Cd duplication live today

Leave a Comment