Search engines require a unique website address for each page to enable scanning, indexing, and user redirection to that page. Let’s explain a bit about the structure of a URL and describe how search engines refer to these addresses. Generally, a URL is divided into several parts as follows:
Beyond the file address itself in the example above (ending with
womens.html), you can see that there’s a parameter named
size referred to as a Query String, and additionally, there’s another parameter named
info referred to as a Hash Tag in professional terminology.
Those Query Strings in the URL pass information that can be used on the mentioned page. On the other hand, Hash Tags are used to identify the part on the page to which scrolling will occur in the browser (based on the ID of some existing HTML element on the page).
It’s important to note that Google and search engines ignore those Hash Tags but definitely consider Query Strings.
Therefore, when there’s widespread use of such parameters (for instance, in digital stores), you must ensure that search engines treat the same URL but with different Query Strings as the same URL.
Otherwise, they might treat the same address with different parameters as different URLs or duplicate content.
Blocking Search Engines and Using Canonical URLs
You can block search engines from referring to these addresses by using a robots.txt file, and this can be done in many cases. The way to block addresses with Query Strings is done as follows:
User-agent: * Disallow: *?dir=* Disallow: *&order=* Disallow: *?price=*
But in many cases, the proper way to handle these situations is through the use of canonical URLs, which are an integral part of technical SEO.
You need to ensure that for every address with different parameters, there’s a canonical URL pointing to the base category URL.
Here are some examples for illustration (I’ve removed the protocol for table readability):
|URL/Page Type||Visible URL||Canonical URL|
|Base Category URL||domain.co.il/page-slug||domain.co.il/page-slug|
|Social Tracking URL||domain.co.il/page-slug?utm_source=twitter||domain.co.il/page-slug|
|Affiliate Tracking URL||domain.co.il/page-slug?a_aid=123456||domain.co.il/page-slug|
|Sorted Category URL||domain.co.il/page-slug?dir=asc&order=price||domain.co.il/page-slug|
|Filtered Category URL||domain.co.il/page-slug?price=13||domain.co.il/page-slug|
Distinguishing Between Different Types of URLs
Google and other search engines treat addresses with and without WWW as different addresses. The same goes for HTTP versus HTTPS.
It’s worth noting that when you add your site to Google’s Search Console, you need to add all four property versions for different variations.
Furthermore, you should differentiate between addresses that end with a trailing slash (/) and those without it, which is called Trailing Slash in professional language.
If you look at the main domain address, Google doesn’t consider this trailing slash as a different address, for example – the address
https://example.com/ is equivalent to
However, in the path that appears after the main address, you need to distinguish between the two cases. For instance, the address
https://example.m/dogs is not the same as
For more information about the trailing slash, take a look at the guide on the importance of Trailing Slash in URLs.