Search engines require a unique website address for each page to enable scanning, indexing, and user redirection to that page. Let’s explain a bit about the structure of a URL and describe how search engines refer to these addresses. Generally, a URL is divided into several parts as follows:
protocol://hostname/path/filename?querystring#fragmentFor example:
https://www.example.com/walkingshoes/womens.html?size=8#infoBeyond the file address itself in the example above (ending with womens.html), you can see that there’s a parameter named size referred to as a Query String, and additionally, there’s another parameter named info referred to as a Fragment (the part after the # symbol).
Query Strings in the URL pass information that can be used on the mentioned page. Fragments, on the other hand, are used to identify the section on the page to which the browser will scroll (based on the ID of an HTML element on the page).
It’s important to note that Google and search engines ignore fragments entirely but definitely consider Query Strings.
Therefore, when there’s widespread use of such parameters (for instance, in digital stores), you must ensure that search engines treat the same URL but with different Query Strings as the same URL.
Otherwise, they might treat the same address with different parameters as different URLs or duplicate content.
Blocking Search Engines and Using Canonical URLs
You can block search engines from crawling these addresses by using a robots.txt file, and this can be done in many cases. The way to block addresses with Query Strings is done as follows:
User-agent: *
Disallow: *?dir=*
Disallow: *&order=*
Disallow: *?price=*Blocking URLs in robots.txt prevents crawling but not indexing. If Google discovers these URLs through links on other pages, it may still index them without crawling their content. For reliable duplicate prevention, use canonical URLs instead of – or in addition to – robots.txt rules.
In many cases, the proper way to handle these situations is through the use of canonical URLs, which are an integral part of technical SEO.
You need to ensure that for every address with different parameters, there’s a canonical URL pointing to the base category URL.
Here are some examples for illustration (I’ve removed the protocol for table readability):
| URL/Page Type | Visible URL | Canonical URL |
| Base Category URL | domain.co.il/page-slug | domain.co.il/page-slug |
| Social Tracking URL | domain.co.il/page-slug?utm_source=twitter | domain.co.il/page-slug |
| Affiliate Tracking URL | domain.co.il/page-slug?a_aid=123456 | domain.co.il/page-slug |
| Sorted Category URL | domain.co.il/page-slug?dir=asc&order=price | domain.co.il/page-slug |
| Filtered Category URL | domain.co.il/page-slug?price=13 | domain.co.il/page-slug |
Distinguishing Between Different Types of URLs
Google and other search engines treat addresses with and without WWW as different addresses. The same goes for HTTP versus HTTPS.
When you add your site to Google Search Console, the recommended approach is to use a Domain property, which covers all URL variations (http, https, www, non-www) in a single property. If you use the older URL-prefix method instead, you would need to add each variation separately.
Furthermore, you should differentiate between addresses that end with a trailing slash (/) and those without it, which is called Trailing Slash in professional language.
If you look at the main domain address, Google doesn’t consider this trailing slash as a different address, for example – the address https://example.com/ is equivalent to https://example.com.
However, in the path that appears after the main address, you need to distinguish between the two cases. For instance, the address https://example.com/dogs is not the same as https://example.com/dogs/.
For more information about the trailing slash, take a look at the guide on the importance of Trailing Slash in URLs.
FAQs
Common questions about how search engines handle URLs:
http://example.com and https://example.com as different URLs. The same applies to www and non-www versions. To avoid duplicate content issues, choose one version and redirect the others to it using 301 redirects, and set a canonical URL on all pages.noindex meta tag or canonical URLs instead.?) sends parameters to the server and can change page content. Google treats URLs with different query strings as potentially different pages. A fragment (the part after #) is handled entirely by the browser for in-page navigation and is never sent to the server. Google ignores fragments completely.
