Search

The way Google and Search Engines Consider URLs

Search engines require a unique website address for each page to enable scanning, indexing, and user redirection to that page. Let’s explain a bit about the structure of a URL and describe how search engines refer to these addresses. Generally, a URL is divided into several parts as follows:

protocol://hostname/path/filename?querystring#fragment

For example:

https://www.example.com/walkingshoes/womens.html?size=8#info

Beyond the file address itself in the example above (ending with womens.html), you can see that there’s a parameter named size referred to as a Query String, and additionally, there’s another parameter named info referred to as a Hash Tag in professional terminology.

Those Query Strings in the URL pass information that can be used on the mentioned page. On the other hand, Hash Tags are used to identify the part on the page to which scrolling will occur in the browser (based on the ID of some existing HTML element on the page).

It’s important to note that Google and search engines ignore those Hash Tags but definitely consider Query Strings.

Therefore, when there’s widespread use of such parameters (for instance, in digital stores), you must ensure that search engines treat the same URL but with different Query Strings as the same URL.

Otherwise, they might treat the same address with different parameters as different URLs or duplicate content.

Blocking Search Engines and Using Canonical URLs

You can block search engines from referring to these addresses by using a robots.txt file, and this can be done in many cases. The way to block addresses with Query Strings is done as follows:

User-agent: *
Disallow: *?dir=*
Disallow: *&order=*
Disallow: *?price=*

But in many cases, the proper way to handle these situations is through the use of canonical URLs, which are an integral part of technical SEO.

You need to ensure that for every address with different parameters, there’s a canonical URL pointing to the base category URL.

Here are some examples for illustration (I’ve removed the protocol for table readability):

URL/Page Type Visible URL Canonical URL
Base Category URL domain.co.il/page-slug domain.co.il/page-slug
Social Tracking URL domain.co.il/page-slug?utm_source=twitter domain.co.il/page-slug
Affiliate Tracking URL domain.co.il/page-slug?a_aid=123456 domain.co.il/page-slug
Sorted Category URL domain.co.il/page-slug?dir=asc&order=price domain.co.il/page-slug
Filtered Category URL domain.co.il/page-slug?price=13 domain.co.il/page-slug

Distinguishing Between Different Types of URLs

Google and other search engines treat addresses with and without WWW as different addresses. The same goes for HTTP versus HTTPS.

It’s worth noting that when you add your site to Google’s Search Console, you need to add all four property versions for different variations.

Furthermore, you should differentiate between addresses that end with a trailing slash (/) and those without it, which is called Trailing Slash in professional language.

If you look at the main domain address, Google doesn’t consider this trailing slash as a different address, for example – the address https://example.com/ is equivalent to https://example.com.

However, in the path that appears after the main address, you need to distinguish between the two cases. For instance, the address https://example.m/dogs is not the same as https://example.m/dogs/.

For more information about the trailing slash, take a look at the guide on the importance of Trailing Slash in URLs.

Roee Yossef
Roee Yossef

I develop websites & custom WordPress themes by design. I love typography, colors & everything between, and aim to provide high performance, seo optimized websites with a clean & semantic code.

0 Comments...

Leave a Comment

Up!
Blog