/webmasters/community?hl=en
/webmasters/community?hl=en
8/1/17
Original Poster
Mikael Roos

Site hijacked in serp by duplicated content

Hi, my site dbwebb.se got hijacked in the serp, for example using search keyword "dbwebb". I used to be the preferred site for that search keyword.

Another domain healthacadiemy[dot]ga has saved their own version of my website content (visible in cache) and managed to replace my original site in the serp (Google serp, not Bing serp).

They are now redirecting any access to their site to another site showing nudity pictures.

Their site is showing different results if using web browser or using curl to access their web pages.

Ive reported the hijacking site through:

Is there a common name for this technique of hijacking search keyword by "stealing" and duplicating another website content?

More suggestions on how to deal with this and some preventive measures?
Community content may not be verified or up-to-date. Learn more.
All Replies (7)
barryhunter
8/3/17
barryhunter
Seems to be often known as a "302 redirect hack"

Although that is somewhat misleading, as often there isn't actually a 302 involved! (its just it can be also be implemented with a 302 redirect instead of a 'copy' of the site its stealing traffic from) 


It's somewhat hard to guard against this. As it requiring no 'part' on your site, its completely seperate, so nothing is in your control. 

Reporting to Google doubt will do anything. The report is for long term training of alogorithms, this should be a 'temporally' thing that will get resolved anyway. 


But, the reason this 'works' is because Google sees this new copy, and thinks its better than your copy. ie the ranking of the copy is better. This is part because, it has the existing reputation of the hacked site (ie healthacadiemy) added to your reputation. 

But it does mean you should then just seek to improve your own sites ranking. Ie compete (and win!) against the copy. 



(although you should also seek to contact the owner of the hacked site, if they remove the hack it will also render it ineffective!) 


8/6/17
Original Poster
Mikael Roos
I did some more checks and from what I can tell they are doing a "serp proxy attack". From what I can see its is roughly explained in The Never Ending SERPs Hijacking Problem: Is there a definite solution?

When Googlebot is visiting the offending site they proxy the request to the actual site, taking the response and doing a find-replace on all urls so that Googlebot will come back to the offender and thinking that the scamsite hosts all pages. Visiting healthacadiemy[dot]ga and setting User-Agent to Googlebot shows my site content. But visiting healthacadiemy[dot]ga through a normal browser will redirect the request to som pornsite with nude pictures.

So Googlebot sees the proxyd/real site content and actual users sees porn.

It seems it would be rather easy for Googlebot to find out that they do not see the same content as the actual users. That should be a warning signal that something might be wrong here.

Anyway, I reported it to Google and they could remove the site from the SERP, if they so choose. They should since the scam site is presenting one content for the Googlebot and another content for actual users, that is not fair, independent on the content. 

To protect myself I checked what ip the offender used and blocked it in Apache. They can not proxy any more pages and sees only a 403. It seems however that they initially had saved a few pages on their own site, but all other requests fails.

Since I was serving my static resource from another domain (images, css, js) it was fairly ease to also block any other HTTP_REFERER than my own site. The offender can now not get images nor stylesheets so the page they show does not look to good for the Googlebot.

These actions should slowly remove the offending site's duplicated content from the SERP.

Here are two images before and after my actions.







8/7/17
Original Poster
Mikael Roos
Another note on how to protect oneself, or rather deal with the fact that someone is stealing your content like this.

The offender saved my first page as an offline version, this means I can update my first page with new content and thereby creating signals to Google that my site is the most up-to-date site with the newest content. Perhaps this could lead to automatic update in the SERP, in my favour.
barryhunter
8/7/17
barryhunter
 
this means I can update my first page with new content and thereby creating signals to Google that my site is the most up-to-date site with the newest content.


Well yes, this is what mentioned you competing against the copy. You need to in effect do better than the copy. 

(although just having 'newest' is probably not enough in itself) 

 
 
8/8/17
Original Poster
Mikael Roos
(although you should also seek to contact the owner of the hacked site, if they remove the hack it will also render it ineffective!)
 
Yes, that is a proper suggestion. However, it seems like this is done in a deliberate manner by the owner of the domain.

Doing a whois on the offending site says:

Organisation:
Gabon TLD B.V.
My GA administrator
P.O. Box 11774
1001 GT Amsterdam
Netherlands
Phone: +31 20 5315725
Fax: +31 20 5315721
E-mail: abuse: ab...@freenom.com, copyright infringement: copy...@freenom.com

Googling "Gabon TLD B.V." shows a few pages connecting that company with equal abuse business. So it seems like a proper scam-the-web company.

I assume one could report the domain name to the DNS provider, but I am uncertain what effect that would have.

I see one can report copyright infringement to the same DNS provider. Perhaps that is also a suggestion. 

I assume one could report copyright infringement to the web server hosting company, in this case Cloudfare Inc. I am uncertain to what effect that would be.


8/9/17
Original Poster
Mikael Roos
Adding some details for those interested in detecting an equal attack.

I checked the access log to find the first entry of the attack.

$ grep 185.32.189.38 access.log | head                                                                                                                                                                                                                    
185.32.189.38 - - [14/Apr/2017:05:57:32 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
185.32.189.38 - - [14/Apr/2017:19:55:46 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [14/Apr/2017:19:55:47 +0200] "GET / HTTP/1.1" 200 24445 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [14/Apr/2017:19:55:50 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [17/Apr/2017:02:44:22 +0200] "GET /robots.txt HTTP/1.1" 200 3449 "https://dbwebb.se/robots.txt" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
185.32.189.38 - - [17/Apr/2017:02:44:23 +0200] "GET /robots.txt HTTP/1.0" 200 3607 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
185.32.189.38 - - [17/Apr/2017:02:44:24 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
185.32.189.38 - - [17/Apr/2017:02:46:00 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
185.32.189.38 - - [17/Apr/2017:02:46:24 +0200] "GET /om HTTP/1.1" 200 15729 "https://dbwebb.se/om" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
185.32.189.38 - - [17/Apr/2017:02:46:47 +0200] "GET /rss HTTP/1.1" 200 15755 "https://dbwebb.se/rss" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

It seems the attack started April 14, 2017. 

The attack seems to have started with the User-Agent "(+http://code.google.com/appengine; appid: s~gce-spider)". This agent is repeatedly visiting my site each 14 days at 20:00 until June 23, 2017 which is the last day it visits my site.

$ grep 's~gce-spider' access.log | more
185.32.189.38 - - [14/Apr/2017:19:55:46 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [14/Apr/2017:19:55:47 +0200] "GET / HTTP/1.1" 200 24445 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [14/Apr/2017:19:55:50 +0200] "GET / HTTP/1.1" 200 27428 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [28/Apr/2017:19:56:54 +0200] "GET / HTTP/1.1" 200 27466 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [28/Apr/2017:19:56:56 +0200] "GET / HTTP/1.1" 200 27466 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [28/Apr/2017:19:56:56 +0200] "GET / HTTP/1.1" 200 27466 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
...
185.32.189.38 - - [09/Jun/2017:20:00:15 +0200] "GET / HTTP/1.1" 200 27925 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [09/Jun/2017:20:00:16 +0200] "GET / HTTP/1.1" 200 27925 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [09/Jun/2017:20:00:16 +0200] "GET / HTTP/1.1" 200 27925 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [23/Jun/2017:20:01:26 +0200] "GET / HTTP/1.1" 200 28158 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"
185.32.189.38 - - [23/Jun/2017:20:01:31 +0200] "GET / HTTP/1.1" 200 28158 "https://dbwebb.se/" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gce-spider)"

I checked Google Analytics and saw I drop in my indexed pages around April 30, 2017, roughly two weeks after the attack was supposedly initiated.


There may be  other reasons for the drop, I can not tell. The drop was from 9296 pages indexed to 8250 and a continued drop, up until today 7478.

Perhaps a drop like this could be a signal that something is going on? I do not know.

The first time I detected that the offending site was outranking and replacing dbwebb.se as number one i the SERP was August 1, 2017.

I tried to look at the offending site through web.archive.org but it was not indexed.

I checked the whois-record for the offending website and it seems that the domain name was registered on 2017-04-13, one day before the attack was started.

Correction: Gabon TLD B.V. seems to be the registrar for the domainnames at freenom[.]com. I can not find who actually owns the domain. 
Geelen Elektronika
8/21/17
Geelen Elektronika
my website has been copied by the same ip address 185.32.189.38 on july 22, 2017
this date on the copied homepage does not change, so it must be a copy, not a link to my site.

a few days ago, they have changed to a new ip address 185.32.189.39
i blocked both ip addresses, so they only get a 403 page.
still they try to read pages from my site every 4 minutes.
even pages which don't exist anymore.

here is a link to the copied site : http://durable.2survivemovie.com/index.htm
the names of all links have been changed. when you click them, you get my 403 page.

when i discovered the copy, i could see the copy when i turned of javascript.
with javascript on i was redirected to a dating site.

at first all pictures had a text 'hotlinking not allowed', as i blocked hotlinking.
now they dont show the pictures anymore.

i changed the css page, so they got yellow letters on a yellow background after they reloaded my css page.
they noticed this change and deleted the css page.

it seems nothing can stop them from trying to copy my site...


Were these replies helpful?
How can we improve them?
 
This question is locked and replying has been disabled. Still have questions? Ask the Help Community.

Badges

Some community members might have badges that indicate their identity or level of participation in a community.

 
Expert - Google Employee — Googler guides and community managers
 
Expert - Community Specialist — Google partners who share their expertise
 
Expert - Gold — Trusted members who are knowledgeable and active contributors
 
Expert - Platinum — Seasoned members who contribute beyond providing help through mentoring, creating content, and more
 
Expert - Alumni — Past members who are no longer active, but were previously recognized for their helpfulness
 
Expert - Silver — New members who are developing their product knowledge
Community content may not be verified or up-to-date. Learn more.

Levels

Member levels indicate a user's level of participation in a forum. The greater the participation, the higher the level. Everyone starts at level 1 and can rise to level 10. These activities can increase your level in a forum:

  • Post an answer.
  • Having your answer selected as the best answer.
  • Having your post rated as helpful.
  • Vote up a post.
  • Correctly mark a topic or post as abuse.

Having a post marked and removed as abuse will slow a user's advance in levels.

View profile in forum?

To view this member's profile, you need to leave the current Help page.

Report abuse in forum?

This comment originated in the Google Product Forum. To report abuse, you need to leave the current Help page.

Reply in forum?

This comment originated in the Google Product Forum. To reply, you need to leave the current Help page.