search ]

How Google and Search Engines Analyze URLs to Improve SEO

Search engines require a unique website address for each page to enable scanning, indexing, and user redirection to that page. Let’s explain a bit about the structure of a URL and describe how search engines refer to these addresses. Generally, a URL is divided into several parts as follows:

protocol://hostname/path/filename?querystring#fragment

For example:

https://www.example.com/walkingshoes/womens.html?size=8#info

Beyond the file address itself in the example above (ending with womens.html), you can see that there’s a parameter named size referred to as a Query String, and additionally, there’s another parameter named info referred to as a Hash Tag in professional terminology.

Those Query Strings in the URL pass information that can be used on the mentioned page. On the other hand, Hash Tags are used to identify the part on the page to which scrolling will occur in the browser (based on the ID of some existing HTML element on the page).

It’s important to note that Google and search engines ignore those Hash Tags but definitely consider Query Strings.

Therefore, when there’s widespread use of such parameters (for instance, in digital stores), you must ensure that search engines treat the same URL but with different Query Strings as the same URL.

Otherwise, they might treat the same address with different parameters as different URLs or duplicate content.

Blocking Search Engines and Using Canonical URLs

You can block search engines from referring to these addresses by using a robots.txt file, and this can be done in many cases. The way to block addresses with Query Strings is done as follows:

User-agent: *
Disallow: *?dir=*
Disallow: *&order=*
Disallow: *?price=*

But in many cases, the proper way to handle these situations is through the use of canonical URLs, which are an integral part of technical SEO.

You need to ensure that for every address with different parameters, there’s a canonical URL pointing to the base category URL.

Here are some examples for illustration (I’ve removed the protocol for table readability):

URL/Page TypeVisible URLCanonical URL
Base Category URLdomain.co.il/page-slugdomain.co.il/page-slug
Social Tracking URLdomain.co.il/page-slug?utm_source=twitterdomain.co.il/page-slug
Affiliate Tracking URLdomain.co.il/page-slug?a_aid=123456domain.co.il/page-slug
Sorted Category URLdomain.co.il/page-slug?dir=asc&order=pricedomain.co.il/page-slug
Filtered Category URLdomain.co.il/page-slug?price=13domain.co.il/page-slug

Distinguishing Between Different Types of URLs

Google and other search engines treat addresses with and without WWW as different addresses. The same goes for HTTP versus HTTPS.

It’s worth noting that when you add your site to Google’s Search Console, you need to add all four property versions for different variations.

Furthermore, you should differentiate between addresses that end with a trailing slash (/) and those without it, which is called Trailing Slash in professional language.

If you look at the main domain address, Google doesn’t consider this trailing slash as a different address, for example – the address https://example.com/ is equivalent to https://example.com.

However, in the path that appears after the main address, you need to distinguish between the two cases. For instance, the address https://example.m/dogs is not the same as https://example.m/dogs/.

For more information about the trailing slash, take a look at the guide on the importance of Trailing Slash in URLs.

0 Comments...

Leave a Comment

To add code, use the buttons below. For instance, click the PHP button to insert PHP code within the shortcode. If you notice any typos, please let us know!

Savvy WordPress Development