How to Find All Canonical URLs on a Website

Canonical URLs are an important part of SEO if they are implemented correctly. If not, they can become an issue and cause chaos in your site structure.

Today, I’m going to share in detail what canonical URLs are, why they are important, and how to find them on your website to keep them groomed.

Hey! Watch our new video about canonical tags and how to find them on your site.

What is a canonical URL?

A canonical URL, or a canonical link, is like the "boss" page when you have other pages that are very similar or the same. It is the URL that Google indexes and displays in search results.

What is a canonical tag?

A canonical tag — rel=”canonical” — is an HTML element that is used to define the canonical page. A canonical tag is put in the HTML of a duplicate page to point to the “main” version.

Example

On your site, you have the two following URLs:

  • www.site.com/shoes-with-roses.html (the main version)
  • www.site.com/shoes-with-roses.html?utm_campaign=sale-2023 (the URL with UTM parameters used in a particular campaign)

As the content of these pages is identical, you need to specify which version is preferred for indexation. To do this, you need to add the rel="canonical" tag to the duplicate page’s HTML, like this:

<link rel="canonical" href="www.site.com/shoes-with-roses.html"/>

Why use canonical URLs?

Canonical URLs are necessary to prevent duplicate content issues and, consequently, avoid Google sanctions that may follow. That’s why rel="canonical" is applied to pages with UTM parameters, printed versions, pages with additional filters, dynamic URLs, and language versions.

Canonical URLs and CMS

Duplicate content issues are especially common for ecommerce websites. That’s why many content management systems, automatically canonicalize the “main” page to prevent possible problems.

This is how Shopify, Wix, and WordPress work, for example. The CMSs automatically mark “clean” URLs as canonicals, so the dynamic URL structure that is common for ecommerce websites does not harm SEO.

Other CMSs including Joomla! and Adobe Ecommerce (ex. Magento) require plugins to set up canonicals.

Note: On our blog, we have detailed posts on how to do SEO for Shopify, WordPress, Wix, and Magento. Feel free to read them to boost your ecommerce website.

How to find canonical URLs?

To find canonical URLs on your site, you need to conduct a website audit. You can use any site audit tool you are used to but make sure it is capable of finding canonicals. I’ll describe a couple of handy ways below.

Finding canonical URLs with Google Search Console

For starters, you can use Google Search Console. While the tool may seem pretty basic, it can give you a lot of useful insights and field data right from Google.

So, to find canonicals with GSC, go to the Indexing > Pages report, and look at the reasons why some of your pages are not indexed. You need the lines stating

  • Alternate page with proper canonical tag
  • Duplicate without user-selected canonical
  • Duplicate, Google chose different canonical than user
finding canonicals in google search console

URLs under the Alternate page with proper canonical tag section are the pages that have specified canonical tags. In this case, Google serves the page that you specified as a canonical one. To see what page it is, inspect the URL under consideration:

google accepted your canonical

The Duplicate without user-selected canonical section contains the list of duplicate URLs without specified canonical tags. In this case, Google chooses what pages to serve on its own.

Inspect the necessary URL to see what page has been indexed by Google:

google selected canonical because you have not specified any

Still, keep in mind that Google treats rel=”canonical” as a hint, not a directive. That’s why Google can sometimes choose to serve a different canonical page instead of the page you have specified. In Search Console, these pages are listed under the Duplicate, Google chose different canonical than user section:

google chose different canonical instead of the one you specified

While GSC does help you in spotting pages with the canonical tag, you still need to manually inspect each URL in order to see the page that is treated as a canonical one. That’s why I recommend using more sophisticated SEO tools.

Finding canonical URLs with WebSite Auditor

This step requires WebSite Auditor. You can download it now for free. Download WebSite Auditor

WebSite Auditor is a powerful SEO audit tool that covers literally any aspect of a site’s SEO health. And, sure thing, the tool is capable of finding pages with canonical tags in bulk.

To find canonical URLs using WebSite Auditor, launch the tool and create a project for your website (in case you haven’t used the tool before).

Then go to the Site Structure > Site Audit section and scroll down to Redirects. Click the Pages with rel=“canonical” line, and here you are:

finding canonicals with WebSite Auditor

Download WebSite Auditor

Unlike GSC, WebSite Auditor lets you see all the pages with canonical tags at a glance. Plus, the tool instantly gives you URLs stated as canonicals for each page in the list.

Another section to look at in search of issues related to canonical URLs is Encoding and technical factors > Pages with multiple canonical URLs. Here you can find the pages for which multiple canonical URLs are specified. If there are no issues, the section will look like this:

finding pages with multiple canonical tags with WebSite Auditor

Download WebSite Auditor

Note. Remember to regularly audit the state of your website’s SEO. If you timely spot any issues, no matter if they are related to canonical URLs or not, you will save yourself a bunch of work on fixing and bringing your site back on line.

Canonical URL best practices

Implementing canonical tags is easy. You just need to add the rel=“canonical” tag to the <head> section of a duplicate page. The href link should point to the “main” page. 

<link rel="canonical" href="https://www.site.com/shoes-with-roses.html"/>

To specify canonical for non-HTML pages like PDF files, add a rel=“canonical” HTTP header response right on your site server:

HTTP/1.1 200 OK

Content-Length: 19

...

Link: https://www.site.com/shoes-with-roses.pdf>; rel="canonical"

Although canonical tag implementation should not cause problems, make sure you follow the best practices. This way, you’ll surely avoid problems with indexation and content duplicate issues.

  • Do not include URLs with rel= “canonical” in the sitemap. A canonical tag indicates that the page it is found on should not be indexed, so don’t ask Google to index it.
  • Do not use canonicals in case of pagination. Pagination, if set carelessly, may become a reason for duplicate content issues. Still, canonicals are not the way out in this case. Read our pagination SEO guide to make everything right.
  • Do not use the robots.txt file for canonicalization purposes. If the page is blocked by robots.txt, Googlebot won’t see the canonical tag in its HTML, so the page may still get indexed.
  • Do not use the noindex tag on duplicate pages. Noindex tag will block the page from search at all. rel=“canonical” will be more than enough.
  • Do not specify different URLs as canonical for the same page using the same or different canonicalization techniques. Having multiple canonical URLs for one page is malpractice, so don’t do this.
  • Do not use the URL removal tool. URL removal tool removes all the versions of the page, both canonical and duplicates.
  • Link to canonical URLs only. Linking to non-canonical pages may result in them being indexed by Google. As a result, content duplicate issues may appear.
  • Self-canonicalize local pages. This is an optional practice, but international SEO experts suggest doing so in order to make Google show relevant site versions depending on users’ location. Don’t forget to specify hreflang tags properly as well.
Self-canonicals for localized pages

Common mistakes in canonical URLs

Much has been said about the importance of proper canonical tag implementation. But mistakes still happen. Look at the below-listed flaws and never repeat them, at least intentionally.

Not using rel=“canonical” at all

The main mistake related to canonical URLs is — a paradox — the absence of the rel=“canonical” tag. 

People often forget to place canonical tags on pages with UTM parameters or printed versions, thinking that Google will choose the better version itself. Like, if Google can choose different versions despite the stated canonical ones, why care?

The truth is that Google may also restrain from choosing anything at all and index all the URLs it discovers. This will only result in content duplicate issues. 

Although duplicate content itself does not cause any manual penalties from Google, it may decrease the overall site authority. Besides, content duplicate issues slow down indexation, as the crawl budget is wasted on each duplicate URL.

Using canonical tag for pages with 4XX status code

Broken links do not bring you clicks, traffic, and SEO effort payoff. The same is true when it comes to canonical URLs.

When a duplicate page with a canonical tag responds 404 not found, Google will not be able to crawl it, see the tag, and, consequently, the canonical URL you have stated. This works pretty much like blocking a page with robots.txt or using a noindex tag.

If a canonical URL is broken, then Google will pass all the link equity to nowhere.

Specifying a redirected canonical link

Specifying a canonical link that redirects to some other page will only confuse Google and may result in indexing issues and link juice loss.

Using 301 redirects is only needed when you want to permanently replace a page with a new, more relevant one with better content. These will not be duplicate pages but two different ones. 

As redirect makes the old page version unavailable, indicate the destination page as canonical everywhere the previous one was used.

Feel free to read our guide on redirects to choose the best solutions for your site.

To sum it up

Finding and inspecting canonical URLs is an important part of any SEO audit. Do it regularly, avoid mistakes, and fix everything timely to make your site rank and prosper.

By the way, what canonical URL tactics and best practices do you use in your SEO routine? Share your experience in our Facebook community.

Article stats:
Linking websites N/A
Backlinks N/A
InLink Rank N/A
Data from: backlink checker.
Got questions or comments?
Join our community on Facebook!