The Problem With Joomla’s Canonical URL Links

Joomla 3 introduced a new feature of canonical url links. There is no way to turn it off in settings, which wouldn't be a problem if it worked correctly, but unfortunately it doesn't. It has been plagued by problems since it was introduced and has led to some sites taking a massive hit on visitor numbers after upgrading to J3.

I was first alerted to the problem of Joomla's canonical links by Microsoft's SEO Toolkit which basically showed me that any page with a item number filter such as a category listing page is accessible by multiple urls whilst having different canonical links.

The whole point of a canonical link is to tell search engines that you know a page is accessible by multiple urls and that they should index one of them and forget the others.

In a category listing page the url is accessible by the following urls whilst showing the same content (assuming you made a menu link to the category using it's name)

/category
and
/category?limitstart=0

Both pages show exactly the same content and the canonical link should go to /category but they both point to their actual urls.

That might seem unimportant, but what it is doing is telling search engines that the 2 pages are infact different and should be indexed separately and shown on Google. Google have stated in the past that they have ranking factors for duplicate pages and can lead to pages being de-indexed or not shown in search results. Not such a problem when both pages are on your own site but really not ideal when it should work in a different way.

There are other examples of this such as the new tagging component where tags can be accessed by /id-name and just /id and the links are once again the same as the urls but duplicate pages.

Previous problems with canonical urls can be found on the Joomla tracker: Joomla sef wrong canonical urls

I have added this problem to the tracker here: Joomla canonical links broken with some additional details.

The fix to the limitstart pages is an easy one. Open /plugins/system/sef/sef.php and find the lines:

if ($uri !== $link)
 {
 $doc->addHeadLink(htmlspecialchars($link), 'canonical');
 }

and change it to:

if ($uri !== $link)
 {
 $link = str_replace('?limitstart=0', '', $link);
 $doc->addHeadLink(htmlspecialchars($link), 'canonical');
 }

That is not really a fix though as all the other problems still exist. For the time being on this site I have disallowed /component/ in robots.txt to stop the indexing of tag pages until it is fixed and used the above method to remove the start limit from the canonical tag.

I guess that's what you get for basing your site on a short term release. Hopefully it will be fixed soon.