Dec 4, 2020

Does Google think we are Spam?

Context:

I work on Trustedreviews.com and we have seen very weird anomalies across our site, we’re an independent business but we weren’t 2 years ago when we first got hit around the time of the Google E-A-T update.

From that point it has just been traffic loss after traffic loss after the Google Updates.

  • It all started at the time of the E-A-T update. Since then we have made strides to improve the site to the best of our ability which required huge investments.
  • Our current trust pilot score for example is 4 out of 5 compared to 1.5 and 2 stars for the sites that Google is rewarding.
  • End of November 2019 there was an unannounced update and we lost 35% from the traffic levels overnight
  • If I compare this year to the lower period last year we have lost a further 30% traffic over the year which feels more like a slow burn
  • Numerous Negative SEO attacks
  • All in the all the site has lost a business crippling 80% of its traffic in 3 years
  • It looks like the December 2020 update is showing the same trend which started 3 days before the update was actually announced

Our editorial guidelines are very clear, we only publish a review of products that we have actually been able to use and test unlike a lot of our competitors. Being a brand that is over 15 years old we have great access to tech brands and exclusives which makes us newsworthy and relevant to our audience. 

All traffic that we were previously getting has migrated and moved to the benefit of one or two other sites in the US and UK. Sometimes we even use the same freelancers as these brands, which is a common occurrence in the tech niche. Staff have also moved between the companies so the content quality and the general training of the journalists is generally the same.

What has been done: 

Over the last year, we have changed the way we work but also improved the fundamentals on the site. 

Some of the largest names in SEO have audited the site and all of their suggestions have been implemented. 

We have pruned a lot of the older and less relevant content. We have also consolidated a lot of older content that sat across multiple pages. We have streamlined our top navigation and worked on being more streamlined in our pagination. 

This has resulted in a reduction in the number of “low-quality” crawlable pages by around 60%. Low quality being defined as not providing longer term values to our user or having no longer term brand value. 

On top of this we have invested heavily in backend improvements of our CMS. This included a lot of work to improve performance on the site. We took our performance score, as measured by Lighthouse, on mobile from below 20 to over 90 in most cases. Which also looked to improve CLS, LCP and FID, which we think are now best in class for our niche.

What are we currently seeing: 

The site is built on WordPress and I am seeing some weird stuff reported in GSC. I don't believe this stuff is helping our site and it doesn't seem like it is something we can control. One of these is Googlebot picking up core WordPress URLs that aren't used in our site structure -

Sample from November 15:

In itself, this isn't a problem. The redirects are handled properly but I keep seeing these as the referring URLs in the Inspect URL tool. I have no clue where Google is finding these links and they aren't part of our site structure. 

Another issue is that Google is picking up none live subdomains as referring pages. If I inspect one of the ?p= URLs mentioned above I get correctly redirected to - https://www.trustedreviews.com/how-to/how-to-screen-record-on-iphone-3294586. If I inspect that page in GSC I get the referring page of https://live.trustedreviews.com/how-to/how-to-screen-record-on-iphone-3294586 which was last crawled on 18/11/2020.

I am also seeing Google hanging onto our AMP pages even though they have been removed and the cache cleared.

We have a history of negative SEO attacks where someone has been creating search pages and linking to them. We have tried to disavow as many of these as possible. These URLs often have Japanese or Chinese characters and are being indexed by Google. Looking at this it seems clear that Google is spending more time crawling junk URLs that other people are creating than the good stuff we create. 

As a benchmark I have been using Bing traffic to keep track of search fundamentals. Bing traffic has remained stable so for me this is certainly something that Google Specifically dislikes about our site. 

Given the amount of traffic losses we have experienced over the past 2 years I don’t believe that this is a situation where “there is nothing you can do”. We have performed a lot of work on the site, at a large cost, and have looked to make improvements that centred around improving the site for the user. There is obviously something that the Google Algo doesn’t like about our site specifically and in these times I see a manual action as a positive outcome.
Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
Recommended Answer
Dec 10, 2020
Looks like you received some advice from John Mueller on Twitter.
 
 
Hopefully that will help
 
Rt2
Platinum Product Expert Andrés Tirado 🚨🐼🐧🚨 recommended this
Original Poster Eamon Looney 921 marked this as an answer
Helpful?
All Replies (17)
Dec 4, 2020
Doesn’t sound right to me. One for Danny I think
Recommended Answer
Dec 10, 2020
Looks like you received some advice from John Mueller on Twitter.
 
 
Hopefully that will help
 
Rt2
Platinum Product Expert Andrés Tirado 🚨🐼🐧🚨 recommended this
Original Poster Eamon Looney 921 marked this as an answer
Dec 10, 2020
Observation -
 
Pagination isn't particularly SEO friendly
Pagination canonicals to the first page in the series now
<link rel="canonical" href="https://www.trustedreviews.com/category/news">
This tells search engines that the pages aren't important to crawl and effectively orphaning older content unless there are robust other paths to the content.
 
So I thought hmm, was it always this way? Nope! In May 2019 pagination was self canonical and not only that, the linking to extended paginated pages was far more robust! (way better for crawling)
 
This factor alone in my opinion could be holding back your older content and potentially allowing scrapers to compete. The change happened sometime between August 2019 and March 2020. The biggest dip in traffic occurred late October 2019 (according to Ahrefs) so definitely in the range. 
 
Hope this helps
Dec 10, 2020
The following is a BAD idea
I was trying to triangulate exactly when you changed your pagination style and ended up on the following page: https://www.trustedreviews.com/reviews/page/2?product_type=mobile
 
Note: the page has BOTH noindex markup AND a canonical tag. 
<meta name="robots" content="noindex,follow">
<link rel="canonical" href="https://www.trustedreviews.com/reviews">
 
When Google crawls this page there is some risk of it attributing the noindex to the canonical url.
 
Of course the noindex on the page also eliminates it as a useful crawl path as well.
Difficult to tell but it looks new as well.
Dec 10, 2020
Some good feedback coming through here.  Well done guys!
Dec 10, 2020
Here's where Google is finding the parameter appended URLs
cases like this:
note redirect
 
Also, I'm paying close attention to pages that had been performing well and then went thud
was top 10 for "ultrabook" in November 19 then dropped to 43
my best guess, it dropped off page 1 of the "best" category (pagination no longer seo friendly)
 
seeing a patten of stable performance on some pages and them wham they get shelled
 
 
Dec 10, 2020
Hi OptimistPrime,

Thanks for your thoughts and findings. A lot of that has been by design, although we probably should/will revert those canonicals. but what we found is that Google is tended to ignore those canonicals in most cases. These were added post massive traffic drops and we use to be part of a larger media publishing house and all brands followed suit, I don't believe any of them use the canonical though but that is something that should be fairly straight forward for us to change. As you can imagine there has been a lot to prioritise and since Google has ignored these canonicals this was not an urgent fix, maybe it should have been?

What we found after Nov 2019 update was that Google was overly crawling our older content, disproportionally. The pagination was a way to try and get Google to focus on our newer hero content and not spend to much time going backwards. These are still all available in the XML sitemap and we have done competitor analysis which shows that ours is actually more open than our competitors. What we also find is that our user don't need more than 9 pages of content. This also coupled into a bloated navigation we had which meant that the discovery of URL was, in my mind at least, way to easy and generally not good for our users.

Essentially what I am trying to get across is that everything we have done on the site over the past year has been calculated and driven by data and what we think users want to see. There are things we have to fix and we'll continue to do so.

This is part of a problem in a way, if there is something fundamentally wrong what is it and is it that bad to cause the types of losses we have seen? Would canonicalisation in paginated warrant a site to be cut in half traffic wise?
Dec 10, 2020
Thanks for the feedback!
 
I am just visually analyzing at this point and it feels like the path to your deeper articles is weak but that isn't scientific. Thinking aloud - if I had Screaming Frog crawl the site and honor canonical tags (not crawl anything non-canonical) would you still be on par with competitors? 
 
At some point it looks like a ton of articles were deleted - realize this may have been an attempt at content clean up but all that link loss could have slowed/dampened recovery efforts. I see the old urls now redirecting to high level category pages which is sub-optimal but maybe as good as you can muster at this point. It looks like the content was deleted all at once, when did this occur? 
 
The high levels of links from Wikipedia suggest an overt link building strategy at some point but that is just an observation not condemnation.
 
Going father back, when I see things like trustedreviews.co.uk redirecting in full to your home page it smells like some very poor decision making has been occurring before you arrived. That suggests someone not with a SEO background making SEO decisions. 
 
Lots and lots of pages going through redirects.
 
XML sitemaps don't pass ranking signals so they don't count for on-site bot discovery path. 
 
Assuming Ahrefs ranking data is remotely correct, I am seeing the site holding steading in may regards but individual articles that rank highly getting decimated periodically.  I would look into why, cumulatively this could have a disastrous effect on traffic. 
 
If you still have access to log files going back to late last year, I'd be very interested to compare crawl activity before and after the drop. Hopefully some of that is still preserved.
 
Just jotting down thoughts
 
 
 
 
Dec 10, 2020
Thanks for this it's really helpful! Great discussion :)

When I started we did a lot of stand alone deals content. In the moment these provide value for the user but longer terms they don't add any value to our site or readers, and iPhone 6 deal from 2017 for example. 

You're correct in spotting that we had a program of content removal. However this was all focused on content that we wouldn't want readers stumbling across. At the same point we also undertook a process of content consolidation. A lot of older reviews sat across multiple pages and this historic decision wasn't with the user in mind. These pages would have a small amount of content before trying to move the user to the next page. An example is that we could've had some reviews sitting across 9 pages where there was only about a 1,000 words. These reviews still provided value as a whole so we worked on consolidating these into one review and minimise this practice going forward (sometimes it makes sense to still do it). Redirects were applied to go to the parent page of the review. For a pure URL number perspective, this was quite a few URLs but we took out time and didn't release it all in one go.

Other content we removed was older news articles that had been carried by the site historically, these tend to be redirected to category pages. We could have skirted the edges here and tried to find relevant evergreen content to redirect these, but I have always found this practices to be a bit disingenuous. We would rather recirculate our users into a relevant category than a best guess evergreen article. 

These news articles had also been through multiple migrations and were in a bad state (pixelated images etc). We took the decision to remove some of this content but only tackle stuff from 2012 and before. With deals content we wanted to make sure that we where only carrying what was needed for the user and not overly bloating the site with quick fire deals articles. We have also amended this activity going forward. 

At this point we have never removed a review from the site because they're important to us as a brand and they make us who we are. Content/site clean up has been a big focus for us, not just for search but also looking at what we would be happy for our readers to read. I sometimes get lost in the older reviews of products past, which is alway quite nice. These work was all done between Jan 2020 and end March 2020. 

I take your point about canonicals and pagination, I didn't think about it in that way, always good to have a different perspective. I'll look to get that changed as a higher priority. 

My start date was Nov 2019 so I can't really vouch for anything that has gone on previously. What I will say is that if it looks like there is a Wikipedia linkbuilding campaign going on, I can assure you this isn't by design. I think this points back to someone potentially doing stuff to hinder our brand through effective negative SEO, although as John has previously said this unlikely, although I like to think anything is possible. It could also just be people on the wider internet wanting to reference our stuff.

With the TrustedReviews.co.uk domain, my belief is that we haven't used this domain and it's simply there to redirect users who may misspell our name. We also has some EU variations with the same behaviour, we aren't looking for any search benefit out of this activity just to keep all our owned properties going to the same place.

If you could provide any example of ahrefs keywords that would be great. What we tend to find after these updates we can hold or win position for more generic top level keywords but start to get hit more on the longer tail, discover and news. We also struggle to be in consideration for the publication in the ecommerce serps.

I have access to the log file but only recently. What I am in the process of looking for is a way to visualise them. We need a cost effectively solutions to do this and are trying to work with an ELK stack integration but is proving challenging and time is now moving over to how we react to the latest update. This is something I love doing so I personally disappointed I have to reprioritise this.

Thanks again for all you feedback, it's really appreciated.
Dec 10, 2020
Eamon

Only popping back in for a moment as Optimist has been provided amazing inputs.

I notice that you are confirming at lot of Optimist is raising. Great to see.

One thing that I am wondering is there anything else that you are thinking or perhaps even know that is not included so far. Maybe just "half-thought" or "that's not relevant" moments but anything that you can think of or add really helps the detective work!!!

Rt2
Last edited Dec 10, 2020
Dec 10, 2020
FWIW - your site is allowing me to crawl it in full with Screaming Frog which means others can easily scrape you as well
 
Starting with your home page I crawled a sample of 13,499 HTML web pages
I excluded crawling of any non-canonical or noindex pages to more accurately represent effective crawl path between pages.
 
So far:
13,499 total pages
removed any pages that redirect or 404
13,420 
of the remaining
only 2 unique in-links 10,039 (the URL itself and a link from the archive)
only 3 unique in-links 1,662
4-9 unique in-links 1,172
10-20 unique in-links 214
21-60 unique in-links 128
61-100 unique in-links 25
101-200 unique in-links 138 (mostly archive pages)
201-13,586 unique in-links 42 (author pages, high level categories, primary nav)
 
To me, that means over 11,000 of the roughly 13,000 pages crawled are heavily marginalized from an SEO perspective. These marginalized pages make up a huge percentage of your organic traffic generating opportunity.
 
You are way over-dependent upon "archives" pages to support crawl path.
Archive pages are a poor substitute for crawl path from pages users actually visit. Why? because Googlebot is designed to crawl pages that the consumer would visit and be indexed. Archive pages are not that.
 
Summary:
I think you will find that under this type of scrutiny, your internal navigation needed for SEO purposes is NOT competitive.
 
Of course this is just a sample of the entire site, but given that SF started from the home page all the best linked pages should have been included in my numbers.  This suggests that the percentage of the site that is poorly interlinked is actually worse that the numbers I provided. 
 
OT you have some surprising 404s
 
edit: sincerely trying to be helpful in my criticism. 
 
 
 
 
 
 
 
Last edited Dec 10, 2020
Dec 10, 2020
Looks like trustedreviews.co.uk was a full fledged site, at least for a few years
http://www.trustedreviews.co.uk/panasonic-dmp-bdt210-review-features-page-2
Not making a fuss about this, happened a long time ago, just an observation.
Dec 10, 2020
"edit: sincerely trying to be helpful in my criticism." - Please don't worry the critique is welcomed.

I think I have caught up, that's a great analysis. We are running on WordPress so naturally have a very flat structure. 

Our competitors tend to have the same type of pagination and the majority even share the same CMS. My personal belief is that this wouldn't cause substantial losses but will certainly not be optimal and will be contributing to some losses or rather holding us back. This is certainly going to be something I am going to look at fixing by expanding out the taxonomy in a slow and steady manner.

I do have a question if you don't mind. Essentially we're not blessed to be running a multimillion-pound ecommerce CMS that naturally creates structure due to the ways in which they merchandise product catalogues. Within WordPress our default is flat, so for example if I was looking into better structuring our best lists I could have our best wireless earbuds piece in both the best and maybe a sub-category of headphones. So in this example, the structure could look like -
  • /best
  • /best/headphones
  • /best/phones
The issue we run into is that WordPress naturally wants to create a flat structure so the same example with no development would be -
  • /best
  • /best-headphones
  • /best-etc
I know you could probably force it without development but it would be a mess and we're trying to keep things as clean as possible.

My preference would be the first option. What are your thoughts?

Thanks for flagging the .co.uk and 404s which I'll also look into :)

Again thanks.
false
10954685425158569144
true
Search Help Center
true
true
true
true
true
83844
false
false
Search
Clear search
Close search
Main menu