Savvy / Blog / SEO / Duplicate Content – Causes & Solutions in WordPress Sites

Duplicate Content – Causes & Solutions in WordPress Sites

By Roee Yossef•Updated on May 28, 2024 • No Comments

A significant issue within the realm of SEO is the presence of duplicated content, which can hinder a website’s ability to achieve organic rankings and overall success.

Search engines, such as Google, are sensitive to content that is identical and appears in multiple locations across the internet. They also raise concerns about similar content that is found in various sections of the same website, irrespective of whether it’s a WordPress site or any other type of website.

It’s important to understand: in Google’s eyes, duplicated content or “duplicate content” is any page with identical content displayed at different addresses (URLs).

In the rest of the article, we’ll delve into the different distinctions, but at this stage, it’s crucial to emphasize – every page on a site that is readable and accessible to Google should be represented by a single URL address only.

When content on a site appears at multiple URL addresses, it could be accessible to users on all those addresses. And when external sites start linking to various variations of those addresses, the problem worsens.

Why should you pay attention to duplicated content on your site or your client’s site? The answer is simple – because it could impact the site’s exposure to visitors. In most cases, if Google identifies duplicated content on a site, it will decide on its own which page to show to users in search results (and it won’t always be the page intended by the creator).

This situation can affect user behavior and the user experience on the site, potentially harming its ranking. As a result, organic traffic to the site might decrease, and that’s what we’re essentially trying to avoid.

Main Reasons for Duplicated Content on WordPress Sites

There are many possible reasons for duplicated content on WordPress sites – some arise from improper or incorrect content input, but most stem precisely from incorrect settings or various technical reasons.

In this article, we’ll focus on the technical issues that lead to duplicated content on sites and try to understand how to prevent them. Here are five main reasons for potential situations that cause duplicated content:

1. Each Page Represented by Only One URL

This phenomenon may arise during the site’s development, where a specific page (or several pages) are built in a way that they can be accessed from different URLs.

While for the developer or site creator, it might not necessarily be a problem – meaning, in the WordPress database, the page or post is identified by a single ID – for search engines, each address is uniquely associated with a specific page.

When a specific page can be accessed through two different URLs, it’s considered duplicated, and it’s possible that this can have consequences. The most common and prominent example of this issue is when using subcategories in WordPress:

http://www.example.co.il/category/sub-category/
http://www.example.co.il/sub-category/

Both of these addresses will display the same sub-category page, but according to Google, it’s duplicated content – the same page displayed at two different addresses.

It’s enough that the site itself has internal links to both of these addresses (or even external sites linking to both addresses), and Google will index both and identify the content as duplicated. What should you do? Use canonical tags or 301 redirects (we’ll expand on this later).

2. Usage of Parameters

Adding parameters to a page’s URL, in some cases, allows for tracking movement to the page or making easy visual changes, like adding or removing a sidebar and graphical elements.

But be aware that using a parameter that does not change the page’s content can create duplicated content. For example:

http://www.example.co.il/post-name/
http://www.example.co.il/post-name/?source=news

Similar to the previous point, in this case as well, Google will index both versions over time – with the parameter and the original address, and it might identify the content as duplicated.

What should you do? Avoid using parameters as much as possible. In cases where it’s unavoidable, inform Google about the purpose of each parameter and when using them doesn’t change the page’s content.

You can do this through Google’s Search Console settings. Alternatively, use a canonical tag to indicate the original address.

The arrangement of those parameters is also relevant, and a different arrangement of parameters in the same address can also fall into the category of different addresses.
Despite what’s been said, in most cases, changing the parameter arrangement doesn’t affect the page’s content, so pay attention to these cases as well.

3. Usage of Pagination

Dividing archive pages or taxonomies into multiple pages with continuation links (for example, a category page displaying a list of posts) is generally a positive step for user experience.

However, in cases where there’s static text on the category page, like an introductory paragraph or more extended text, it might duplicate across the continuation pages.

A similar and even more severe problem might arise when a post has many comments: if comments are paginated, the post itself could be duplicated across the continuation pages.

What should you do? In cases of pagination in taxonomies, ensure that continuation pages don’t display the category’s introductory text (meaning, configure it to appear only on the first page).

In cases of pagination in comments, consider refreshing the comments without changing the URL using Ajax, or forgo pagination altogether.

4. Printer-Friendly Version

On many websites, especially older ones, there’s a “printer-friendly version” link on some pages. This link opens a separate page where the content is displayed cleanly for printing purposes.

From Google’s perspective, this could be considered duplicated content (as explained before – the content appears at two different addresses, both accessible and readable to Google and other search engines).

This case can also present an additional problem, as search engines might prefer the cleaner printer-friendly version which often lacks ads and banners and shows only the main content. In this case, search results could display this version over the original page.

What should you do? Consider omitting the printer-friendly version of pages and use CSS settings to create content optimized for printing. Just like you use Media Queries for different screens and orientations, you can use them for a printer-friendly version like this:

@media print {
 /* styles go here */
}

5. Different Site Versions

The situation where different, identical versions of the site exist is one of the oldest but most common issues in the field. Despite this, many sites still suffer from it: duplicated versions with and without “www.”

For example, two versions may exist, one with HTTPS and one with HTTP. Another example is when different versions exist for different countries, and these versions use parameters that distinguish them.

In all these cases, each version can display identical content to the others, and the root of the problem is usually the duplicated version of the international target through incorrect settings, the server, or HTACCESS files.

What should you do? Use 301 redirects and a preferred version setting in Google’s Search Console (for example, choose whether the site’s address should include “www.” or not), and ensure that the server settings and the HTACCESS file are correct. This will help focus Google’s indexing on the preferred version, without the duplicated ones.

Summary

In this article, we focused on the technical reasons for duplicated content, particularly in WordPress sites.

However, it’s important to note that duplicate content issues can also arise from non-technical aspects, such as content scraping, plagiarism, and more. Dealing with these issues might require legal action or contacting Google directly.

In the next part of this series, we will delve into the ways to handle duplicated content issues using technical means, specifically through canonical tags, 301 redirects, and other mechanisms.

Remember, duplicated content is not something to be taken lightly, especially if you care about your site’s SEO and organic traffic. Taking steps to prevent and fix duplicated content can help ensure that your site gets the visibility it deserves and that search engines properly rank your intended pages.

Finding Duplicate Content Using SiteLiner

2. Google’s Webmaster Tools

In Google’s Webmaster Tools (Search Console), you can find a tab named HTML Improvements. Under this tab, you can find duplicate meta-titles and duplicate meta-descriptions or duplicate headings.

The advantage of using this tool (which should be used alongside other tools and not in place of them!) is that it only displays pages that Google has already discovered, crawled, and indexed.

In other words, it’s recommended to start fixing these pages over pages that Google hasn’t indexed yet, and the reason for that is clear.

3. Simple Google Search with intitle

Performing a simple Google search using different operators can narrow down our focus on indexed content on a site and how Google sees it. Many are familiar with using the site: operator, which allows us to see all indexed pages on a site.

If you see a message from Google at the end of the results regarding pages that weren’t displayed due to duplicates, click on the link, and you’ll be able to see which pages Google filtered and the reason (not all of them are necessarily duplicates, and some might be blocked from indexing).

If you are aware of existing duplication on your site and want to see how many duplicate pages like these are indexed, you can use the intitle: operator combined with a relevant phrase or keyword, and Google will display all pages where that word appears in their meta-title.

After Identifying Duplicate Content – How to Fix and Prevent Duplicates?

1. The simplest solution is to avoid duplicate content altogether. While this might sound trivial, as we’ve mentioned before, duplicate content arises due to incorrect site settings. It’s important to avoid adding duplicate elements that might (unintelligently and carelessly) create duplicate pages and to ensure that each page has unique content.

2. Canonical URLs – Those links placed within the head of the site signal to Google and other search engines where the original content resides. There are various WordPress plugins that allow you to edit canonical links for each page on your site (e.g., Yoast SEO). If removing or editing the duplicated content is not possible, you can use a canonical URL that points to the original page.

3. 301 Redirects – In some cases, especially when Google’s search engine has already indexed the duplicated content and responded to it, this approach might be suitable. It’s faster and cleaner than using a canonical link (because the duplicated page stops being crawled after a short period) – simply implement a 301 redirect from the duplicated page to the original one.

4. Internal Link to the Original Page – If you can’t edit the duplicated content, add a prominent link within the duplicated page that leads to the original page. This way, you’re providing Google a signal that you’re aware of the duplication and guiding it to the original page.

Additional Points to Consider

It’s important to note that Google’s recommendation is not to block duplicate content from indexing using noindex tags or robots.txt directives.

Search engines should know that duplicate content exists and recognize it. By using the methods we’ve mentioned, we help them understand what the original page is.

Furthermore, if you’ve identified duplicated content on your site and it’s removable or editable (i.e., user-generated content), address it promptly.

If Google has already indexed the duplication, make sure to update it by submitting an updated XML sitemap in Search Console (a sitemap that doesn’t include the duplicated page).

If you’ve deleted a post or page for this matter and your site’s sitemap is generated by a WordPress plugin, the sitemap will update automatically without your intervention.

Lastly, if you’ve implemented a 301 redirect or added a canonical link to the duplicated page, it’s recommended to use a tool like Google’s Fetch as Google to re-crawl the duplicated page and trigger the indexation process.

Will We Get Penalized by Google for Any Duplicate Content?

In a word, no. If you consider content on the internet as a whole, about 25% of it is essentially duplicated content, according to Google. C

onsider pages like “Privacy Policy,” “Terms of Service,” or similar pages – they have very similar content across many sites. Does Google consider these as duplicate content? Not necessarily.

If Google were to assume that every instance of duplicated content we mentioned is some form of spam, the changes to Google’s search results would likely be negative.

In this context, we exclude heavily keyword-stuffed and spammy duplicated content. Generally, only in such cases, Google might impose penalties and impact the site’s search ranking.

Without delving deeper, the point is to reassure that Google’s search engine operates smarter than it might seem, and it tries to look at your site with a somewhat human perspective, considering all the implications.

It’s important to take action and avoid being complacent. It’s better to prevent such situations and fix them as necessary.

If it benefits the clarity of your WordPress site for search engines and user experience, as well as potential site ranking, it’s worth it. Get a glimpse of how Matt Cutts explains how Google deals with duplicate content:

Summary and Final Thoughts

In this article, we presented several possible reasons for duplicate content in WordPress sites stemming from technical issues and incorrect settings. The main reasons for duplicate content are often due to