Making Sense of Duplicate Content and Page Titles in WordPress (WordPress Setup Part 2)
So you’ve read WordPress Setup Part 1 and setup WordPress so it has nice, pretty, descriptive URLs. Now you’re done right? Well, not exactly. WordPress default installs are great for crawlability, meaning that because it has links all over the place, the search engines can always find a path to any article. On the bad side, they can often find six or ten paths to any article. Once upon a time (okay, before WordPress 2.3), you had to worry about actual posts having multiple URLs, but that issue has pretty much disappeared. There is typically only one path to a page, but this doesn’t mean you can’t end up with duplicate content and wasted link juice.
So when viewed from the point of view of the post, there is no duplicate content. But not from the point of view of the text on those pages, that text can appear at many addresses, though there is only one that you want to come up in the search results in Google for that material. Because of the way WordPress lists the most recent posts on the front page, in the category pages, in the archives pages and so forth, the text, or at least the text above the
<!--more--> comment, shows on every one of those pages (the
<!--more--> comment defines how much of the post text ends up on those pages).
This means that you effectively have duplicate content, that is identical content that appears on multiple URLs. In a bad case, this will get some semi-random URL listed in the search engine instead of the one canonical (that is “authoritative, recognized, accepted”) URL that you want the search engines to use to get to that specific page on your site. It might also list both your preferred canonical URL and one or more of the others. That sounds good, because you could just take over the Google listings with your ten different URLs for your page of elephant jokes, but the problem is that it will split the power of those pages (call this Page Rank if you want). This might be even worse than listing the wrong page, because rather than one page in the top-10 in Google, you’ll have a page back at number 50 and another back at number 75 and so on. Nobody reads those pages. Why? Because you’ve ended up dividing up your inbound links and confusing the search engine robot. It’s just a robot—don’t make it think too hard!
For example, let’s say you just wrote a post on The Big Bad List of Elephant Jokes and you assign it a post slug of “elephant-jokes” and you put it in the categories “elephants” and “jokes” and you tag it as “humor”. You write it in June of 2020. This means that Goohoo! finds it at
- http://raisedbyturtles.org/ (b/c it shows up on the home page as the most recent post)
- http://raisedbyturtles.org/category/elephants (b/c it’s the most recent post in that category)
- http://raisedbyturtles.org/category/jokes (ditto)
- http://raisedbyturtles.org/tag/funny (ditto)
- http://raisedbyturtles.org/archives/2020/06/ (because it’s at the top of your June 2020 archives)
- http://raisedbyturtles.org/elephant-jokes (because this is the actual URL).
You don’t really want to do this. You want one canonical URL that reaches any given chunk of content. It’s better for you, your visitors and the search engines. So basically, you want to only index the “real”, that is canonical, URL.
Sorting the Canonical URL and Duplicate Content Issues
How do you do that? You could disallow the search engines from your archive and category pages using a robots.txt file. This will work, but the problem is that if you don’t get crawled before a post gets pushed off your home page, you might never get that post indexed (unless you generate a sitemap perhaps).
So what do you do? Simple, you install the incredible Headspace2 plugin. I used to use and recommend a hacked combination of the SEO Title Tag plugin and the All-in-one SEO Pack. That’s a powerful combo too, but not as powerful as Headspace2 and they need a minor hack (actually just a manual database change) to work together. I don’t say Headspace2 is incredible lightly, but this is just a great idea that is well-executed.
I got a fatal error when I installed H2, version 3.3.16, but that’s because the headpsace/plugins.php file needed to be executable by “owner” and I had the wrong file permissions on it. You can change that simply from your FTP client (try Filezilla if you don’t have an FTP client). If you’ve been using AIOSP, by the way, you can import all your data via the Headspace2 options.
Once you install this plugin (installs like any WP plugin; instructions in the readme file that comes with the download), you need to go in and enable some modules. This is a complex and powerful plugin and not all of it is enabled by default.
- From your WordPress admin area, go to Options » Headspace2 » Modules
- Look over at the “Disabled” list. Drag and drop any of these modules into the “Simple” section. I have the following activated currently:
- No Index/No Follow — essential for sorting the duplicate content issue
- Page Title — essential for the second part of this how-to.
- Page Description — Let’s you create a custom meta description, which will get to in a second.
- More Text — Instead of a generic “Read more” for a continued article, you can customize the text so it’s something like “Read more about sorting out duplicate content…”
- Tags — lets you tag your pages and puts these tags in your meta keywords.
- Now that you have the modules enabled, you’ll be able to control the indexing of all your pages. At edit or creation time, you can keep a single page out of the search indexes, which is useful for things like Contact pages and things like that. More importantly, though, we’ll get rid of all those category and archive pages and make them more or less invisible to the search engines.
- Go back to the Headspace2 “Page Settings”. You should see a list that includes:
- Search Pages
- Tag Pages
- For each of those listed above (not all the ones listed by Headspace2), click on it and, at the bottom of the options, you can see two check boxes. Check the No Index box, but not the No Follow box. Save. This tells the search engine (Google, Yahoo, etc) that it shouldn’t even bother to keep a record of the content of that page, but that it should follow those links on through to the actual pages you want indexed. If you check the No Follow box, you would prevent the search engine from even finding those pages that you really want indexed.
- Note that you can also edit the page title and other information for those pages. We won’t bother right now, but it’s something to keep in mind in case you want to customize any of this.
- Go back to the Headspace2 “Page Settings”. You should see a list that includes:
Sorting out Meta Titles
H2 has another great utility: it lets you set unique meta titles (that’s the one that appears in the upper browser title bar, not the one the reader sees on the page) that are different from your H1 heading title. You can also craft meta descriptions and meta keywords and, in fact, any meta information. It will add additional text entry boxes that let you set your keywords, description and title on the post edit/creation screen.
The meta title is really key and the only one that really really really matters. This is what appears in the big bold text in the search results. This is the first thing about your page that most people will see. You want to make it count and you don’t want to simply duplicate what you have for the post heading. Above all, under no circumstances should the average blogger have a site where the meta title looks like this: My Site Name | Name of My Post. Nobody cares about the name of your stupid site and it’s also not descriptive in the least if you have a name like mine. It makes your titles look less unique and harder to tell apart if your visitor has several pages of your site open in different browser tabs or windows.
Why would you want your meta title to be different from your post title? Well, Google’s top search quality engineer, Matt Cutts, pointed out in his WordPress SEO video that varying these two gives you two chances to match terms. You can use subtly different wording, looking to use alternate spelling (changes and changing in Matt’s example) or related terms (photos and pictures and images for example).
This is actually not why I do it, though.The meta title appears in the search results, so it needs to give the user some information scent. There’s only so much room to be clever. However, in your RSS feed or on page, where you’ve already got the users there, you might want to just give them something funny or clever, but perhaps that does not make the general idea of the article immediately obvious. In many cases, such as a how-to article like this, my two titles might be similar. But when I write some humor or political commentary, I might want to have an H1 heading that is engaging, but not necessarily descriptive in the same way the meta title is.
- Meta Title
- Longer, more descriptive title that should say: “I answer your question. I am the page that you’re looking for. Come look at me.”
- H1 Heading Title
- Might be even longer (on this page I’ve added the “WordPress Setup Part 2”) or very short. It might be pithy, ironic or a mystery whose real meaning is only revealed as the reader goes down the page. The user is on the page already and has a view of the text that follows. The H1 text should say “Read on! I’m funny. I’m interesting. I’m good for a laugh or a solution.” It’s not necessarily a summary.
What if I already have pages without unique titles?
So now if you’ve never written a post and you don’t want to set titles for categories, you’re all set, but what if you are trying to fix up an old site, or you want to attach titles to category pages? Simple. Just leave the Options panel and head on over to Manage » Meta-data. You’ll see that H2 gives you a list with the Post Title (what appears on the page) fixed and the Page Title (what appears in the browser bar) editable. Now, look at the upper right corner of the screen. Headspace lets you mass edit almost everything—page title, post-slug, custom “more” text and everything. This is an amazing management tool.
Other Meta Tags
Who cares about these? The search enignes don’t pay attention anymore, so it’s just a waste of bandwidth, right? Perhaps, but things change and you may someday find these useful for your own internal search algorithms or what have you. I do this for my benefit, not the search engines. I write my title first, which keeps me on topic. I write keywords last, to see how I did. But of course you can ignore it. Since you’re using Headspace, you just generate your tags, which have uses for helping your visitors find related posts and so forth, and these will become meta keywords, so why not (if it’s not worth being a tag, I don’t bother to add extras).
Search engines don’t use this either, right? Probably not for ranking (how high you are in the results), but they might use it for relevance (trying to figure out the actual content of your post, assuming the description matches the rest of the page). More importantly, the will use it for the snippet that appears in the search results in some cases. An example would be where the algorithm tells the engine that your page is on elephant jokes, but it doesn’t find the word on the page so it can’t find a relevant snippet. What does it use? If you have no meta description, it might use nothing or it might just start grabbing your navigation text (I’ve had that happen on image pages). If you have the description, you control what appears in these cases instead of depending on SE magic.
By using Headspace2, you save yourself tons of headaches, lots of theme-hacking, and make your site more usable for visitors and search engines alike. If done right, your duplicate content issues and duplicate title issues will be totally resolved.
Like this post? Subscribe to my RSS feed and get loads more!