Today we are going to talk about a common problem that I often find while undertaking Technical Audits. Even though it may seem harmless at first sight, if solved, it may improve your site’s performance in SERPs: URL Canonicalisation
What is URL canonicalisation?
Canonicalisation issues occur when the same content on a site is displayed by two or more different URLs.
According to Matt Cutts, “Canonicalisation is the process of picking the best URL when there are several choices, and it usually refers to home pages.”
But, when does this happen?
I’ve listed below some of the most common issues that I’ve found although there may be many others, as long as the same content is appearing in different URLs.
- WWW issues
- Dynamic URLs
In this case, the same URL ending with a lot of numbers and letters may appear as another canonicalisation issue. In this case, the problem can be solved by adding the path that is creating those dynamic URLs to the robots.txt.
Problems arising from multiple URLs
Canonicalisation issues can create a number of different problems, some of them more worrying than others. One of the clearest problems is duplication; having multiple pages with the same content can lead to a lower performance in rankings, as Search Engines favour unique content.It’s important to be aware that canonicalisation issues aren’t the only source of duplicate content, so when you find a duplicated page you should first identify the reason behind the duplication and then, apply the correct fix.
Another reason why canonicalised URLs should be avoided is cannibalisation (yes, another weird word). If one of your products is appearing on two different URLs with the same content, it is very likely that these two pages are competing against each other for a better ranking position, which can result in poor performance for both of them. Not all the cases of cannibalisation are caused by duplication however. Two pages can be very different but still be competing for the same keywords and rankings.
When multiple URLs are available, any user can access or link to any of them. This means that the pages are competing for link strength, diluting the positive effects that those links would have if they were pointing to one unique page.
Finally, having different URLs for the same page can also lead to reporting issues, making results harder to track for a specific content, as we need to find all the URLs and combine their results.
As you can see, canonical URLs can create several problems to a website, but how can we avoid them? My favourite option is to use a 301 redirect. They are better than provisional redirects (302) because they pass page rank and I prefer them to canonical tags because they still allow users to access all URLs and even link to them.
For these reasons, a 301 redirect is the best option to avoid these issues.
A guide to identifying and fixing canonicalisation issues:
To help you find and fix canonicalised URLs, I’ve created a small guide that should give you all the steps you need to eliminate any canonicalisation issues.
- Crawl your site
Use your usual crawler to identify all the crawlable URLs in your site.
- Identify cannibalisation issues
To do so, export your list of URLs into an Excel spreadsheet and sort them by alphabetical order. Explore them manually, trying to find similar URLs with small differences like slashes at the end or double slashes.
Remember the examples above and try to identify https and WWW URLs. Do they have any other versions?
For the homepage, copy a piece of content, paste it in Google between quotes and search for instances within the domain:
Site:example.com “The piece of content you have copied from the homepage”
This should show you any duplication that your home page may have in your domain. It takes some work to find all the canonicalised URLs as there is no easy way to automatise the process, but I guarantee it’s worth it.
Some crawling tools such as Screaming Frog and Deep Crawl include a section in their analysis with all the duplicate pages. Take a look at this section to find more duplicate URLs, but bear in mind that not all duplicate pages will have canonical issues.
- Choose the destination page.
Now you have identified the multiple URLs your site has, it’s time to make a decision. Which URL should be the one appearing on Search Engines? There is no simple guide for this, as every website or business is different and will have different strategies in place. The only thing I can recommend is to keep consistency through your entire site and always use the same version for all your pages.
You’ll also need to make sure that the destination pages have a 200 status to ensure that they are not redirecting somewhere else and that the page is actually found.
- Identify links pointing to the other URLs
Once you have chosen the right version of the URL, try to identify all the links that are pointing to the other versions of that page, both internal and external and keep track of them.
- Add 301 Redirects from the rest of the URLs to the destination pages.
- Change all links to point directly to the chosen URL.
Make sure that all the internal links in your website are correctly placed pointing to the chosen URLs. You may think that if the redirect is in place the result should be the same as both, users and bots, will end up in the page you have chosen.
This is true, but a 301 redirect doesn’t pass all the page rank to the destination page, only around 90%. That means that if you have a page linked by 100 internal links and add a redirect from this one to a new https version, only a 90% of the strength from those links will end up in the new one (Option 1 in the figure below). If those links go directly to the destination page, none of that page rank will be lost (Option 2).
If possible, try to contact external websites with links pointing to your old pages to inform them about the changes you have made and asking them to modify the URLs on their links to the updated and correct version.
- Make sure the redirect and the new links are working.
Review the work you have done, making sure that the only indexed pages are the ones you want to appear in Search Engines and that all the links and redirects are correctly placed.
- Make sure that future link building tactics point to the right version.
The work you have just done will mean nothing if you keep linking to the old versions of the page. Make sure that everyone involved with the website knows the correct version of the URLs you are displaying in case they want to share them.
- Keep tracking the results with new crawls
Although you have fixed all the canonicalisation issues, there is no guarantee that they can appear again in the future with new versions of the existing URLs. With periodical crawls and keeping an eye on your rankings, you should be able to identify any new issues and solve them before they grow into a bigger problem.