In the Google Search Console / Search Appearance, there’s a link to a report that shows HTML Improvements you should consider doing to a list of url’s.
The HTML improvements reports pages for which Google doesn’t like the title tag or the meta description, like duplicate title tags, missing title tags, short title tags, duplicate meta descriptions, short meta descriptions.
Are you supposed to do something about that ?
If Google doesn’t like these pages, why does it even bother indexing them ?
And what happens if you don’t do anything about them ? Is the any kind of penalty involved for the reported pages ? Is the whole website at risk of being penalized ?
These aren’t actually yes or no questions: there are probably more than one answer and I have an explanation deriving from a recent experience.
I also figured out how to have Google refresh this data pretty quickly: I’ve read many webmasters complaining about the fact that they fix these HTML Improvements but the numbers in the HTML improvements report never decreases and the explanation that I read the most is: Google doesn’t refresh that report very often but it does apply your changes in it’s index.
To me, that doesn’t make sense: why would Google keep to separate sets of data ? Either the page is ok, either it is not, right ?
So, if these numbers is not changing, Most likely, the applied fix is not ok, or Google hasn’t crawled the page after the HTML improvements were made.
So, if you fix the title duplicate title tags, duplicate meta descriptions and other errors marked in this report, you will need to create a new sitemap with these url’s, submit this sitemap through the Google Search Console and Google will crawl these url’s.
You can name the sitemap anything you want (like fixed.xml) to make sure you know what url’s are in there later, you don’t have to name it sitemap.xml specifically.
And you can submit more than one sitemap, if you need to do this gradually.
And if you do that, and you don’t see the number of HTML Improvements decrease after a few days, or, if you still see the page you believe was fixed in the list of errors, than obviously, you’re not fixing anything.
Why should you fix the reported HTML improvements ?
You may have read that Google likes a clean site and hates duplicate content and that’s true. Actually, Google doesn’t mind a small percentage of your pages being duplicates.
The problem arises when Google detects too many duplicate titles or short meta descriptions, compared to the total number of page that your website has.
My theory is that there has to be a specific percentage of HTML improvements that your website should not reach, otherwise that will trigger some kind of alert and some Google Guy will review your website manually and apply a manual penalty.
Like for example, if he believes that your website is generating pages dynamically with thin content of little or no value (in the Google Guy’s eyes, that is), you’ll get a manual penalty and your website will drop 50 positions in the search results until you fix that.
Common mistake in fixing HTML improvements.
The first things that comes to mind is to use the URLs removal tool (Google Search Console / Google Index / Remove URLs) but Google has been very blurry about this functionality for years.
The URL’s removal tool is only meant to remove a specific page or directory from the search results, not from the Google index. And in order to remove an url with the URLs removal tool, you need to add a disallow directive to your robots.txt tell Google to stop crawling this page.
And if you do that, then Google won’t notice the changes and will still display the cached version of the page: that is one explanation of why you don’t see any changes in the HTML improvements report.
So how do I fix the HTML improvements ?
Let’s say you have one error reporting that http://www.mysite.com/page1.html, http://www.mysite.com/page2.html and http://www.mysite.com/page3.html have a duplicate title tag.
First you need to decide wether you want these pages in the search results or not.
If you do, you just need to adapt the page titles according to the page content and put them in a new sitemap.
But if for example page1.html is a product page (let’s say a brown leather handbag) and page2.html is a variation of that page (same leather handbag, but blue) and page3.html is another variation (white leather handbag).
If Google reports duplicate title tags, than what I do is 301 page2.html & page3.html to page1.html (like I said: if they’re duplicates or Google thinks they are, you either need to remove them or rewrite them.)
The second thing I do before creating my new sitemap is to add a no archive directive ( <meta name=”robots” content=”noarchive”> to the pages that I want to remove from the Google index.
I do this because I noticed that just 301 redirecting is not enough and although the 301 redirects works, Google still displays the cached version of the page in the search results.
So now that this is done, i can create the new sitemap, submit through the Search Console and Google will take care of it pretty soon.
[x_alert type=”info”]You can monitor this by adding an event if you’re using the Google Tag Manager or even triggering a virtual pageview, based on the condition that Googlebot is detected in the user_agent string.[/x_alert]
And I can test and Google one of these pages, I won’t find anymore and I will notice that number of HTML improvements decrease in the next few days (like in a week).
[bctt tweet=”Fix HTML Improvements from the #Google Search Console & get #Google to refresh the data within a few days”]
Aussi dans cet article
- Fix those HTML Improvements listed in the Google Search Console