Search engines require a unique website address for each page to enable scanning, indexing, and user redirection to that page. Let’s explain a bit about the structure of a URL and describe how search engines refer to these addresses. Generally, a URL is divided into several parts as follows:
protocol://hostname/path/filename?querystring#fragment
For example:
https://www.example.com/walkingshoes/womens.html?size=8#info
Beyond the file address itself in the example above (ending with womens.html
), you can see that there’s a parameter named size
referred to as a Query String, and additionally, there’s another parameter named info
referred to as a Hash Tag in professional terminology.
Those Query Strings in the URL pass information that can be used on the mentioned page. On the other hand, Hash Tags are used to identify the part on the page to which scrolling will occur in the browser (based on the ID of some existing HTML element on the page).
It’s important to note that Google and search engines ignore those Hash Tags but definitely consider Query Strings.
Therefore, when there’s widespread use of such parameters (for instance, in digital stores), you must ensure that search engines treat the same URL but with different Query Strings as the same URL.
Otherwise, they might treat the same address with different parameters as different URLs or duplicate content.
Blocking Search Engines and Using Canonical URLs
You can block search engines from referring to these addresses by using a robots.txt file, and this can be done in many cases. The way to block addresses with Query Strings is done as follows:
User-agent: *
Disallow: *?dir=*
Disallow: *&order=*
Disallow: *?price=*
But in many cases, the proper way to handle these situations is through the use of canonical URLs, which are an integral part of technical SEO.
You need to ensure that for every address with different parameters, there’s a canonical URL pointing to the base category URL.
Here are some examples for illustration (I’ve removed the protocol for table readability):
URL/Page Type | Visible URL | Canonical URL |
Base Category URL | domain.co.il/page-slug | domain.co.il/page-slug |
Social Tracking URL | domain.co.il/page-slug?utm_source=twitter | domain.co.il/page-slug |
Affiliate Tracking URL | domain.co.il/page-slug?a_aid=123456 | domain.co.il/page-slug |
Sorted Category URL | domain.co.il/page-slug?dir=asc&order=price | domain.co.il/page-slug |
Filtered Category URL | domain.co.il/page-slug?price=13 | domain.co.il/page-slug |
Distinguishing Between Different Types of URLs
Google and other search engines treat addresses with and without WWW as different addresses. The same goes for HTTP versus HTTPS.
It’s worth noting that when you add your site to Google’s Search Console, you need to add all four property versions for different variations.
Furthermore, you should differentiate between addresses that end with a trailing slash (/) and those without it, which is called Trailing Slash in professional language.
If you look at the main domain address, Google doesn’t consider this trailing slash as a different address, for example – the address https://example.com/
is equivalent to https://example.com
.
However, in the path that appears after the main address, you need to distinguish between the two cases. For instance, the address https://example.m/dogs
is not the same as https://example.m/dogs/
.
For more information about the trailing slash, take a look at the guide on the importance of Trailing Slash in URLs.