Dec 29, 2021

Malicious BOT on googleusercontent.com

I am trying to report a MALFORMED BOT running on the Google servers and I would like to know what the Google policy is that allows this sort of thing to happen.

And what is the policy for bots running on Google servers to follow the robots.txt protocol?

Background:

Every hour my website is being attacked from by someone running a malformed BOT on the Google servers using a user agent called newspaper/0.2.8

It is being used across multiple Google User Content IP addresses and is trying to access the same non-existent pages.

This user agent has been blocked by my robots.txt file but this BOT is NOT following that protocol.

I believe this action to be ABUSE and in direct infringement of your Acceptable Use Policy as it is both invasive and infringing my website.

Despite returning 403 errors these attacks are continuing day after day and, given that you are strongly committed to preventing abuse of the Google Cloud Platform, I request you to take further action against the distributors of this BOT and get these attacks stopped.

I have tried emailing the abuse email but nothing is being done and no human appears to be answering my messages.

Many thanks.

A list of the ip addresses being used by this bot can be provided.

Details

Crawling, Indexing and Ranking

Locked

Informational notification.

This question is locked and replying has been disabled.

Community content may not be verified or up-to-date. Learn more.

All Replies (7)

barryhunter

Diamond Product Expert

Dec 29, 2021

I beleive this

https://support.google.com/code/contact/cloud_platform_report?hl=en

is the offical form to report things like this to Google.

Adrian Hall 1076

Original Poster

Dec 29, 2021

Yep, been doing that for a couple of months now and nothing is being done as far as can see as the attacks still presist, five times every hour.

barryhunter

Diamond Product Expert

Dec 29, 2021

Ok, well then I guess Google are really concerned enough to take specific action.

Its between you and whoever runs the bot. Google arent going to get inolved unless its particully heavy traffic.

Adrian Hall 4074

Dec 29, 2021

That's the issue, I have no idea who is running the bot and why it is targeting my website.

All I have are entries like this every hour -

28-12-2021 23:32:51 - 50.234.141.34.bc.googleusercontent.com - 34.141.234.50 - /feeds - - newspaper/0.3.0

28-12-2021 23:32:51 - 50.234.141.34.bc.googleusercontent.com - 34.141.234.50 - /feed - - newspaper/0.3.0

28-12-2021 23:32:51 - 50.234.141.34.bc.googleusercontent.com - 34.141.234.50 - /rss - - newspaper/0.3.0

28-12-2021 23:32:51 - 50.234.141.34.bc.googleusercontent.com - 34.141.234.50 - / - - newspaper/0.3.0

28-12-2021 23:32:50 - 50.234.141.34.bc.googleusercontent.com - 34.141.234.50 - / - - newspaper/0.3.0

Which is why I would like to know why or even if Google allows such things to happen. I have been unable to find any reference to the robots.txt protocol in the Google terms and conditions. So it appears to me that they are willing for any hacker or spammer to upload code to their user content servers and let them attack websites without any ability to report these attacks.

I had the same from Amazon and within two days this was sorted. It seems to me that Google just can't be bothered with the small guys.

Unless someone here knows a better way to report these.

barryhunter

Diamond Product Expert

Dec 30, 2021

Well robots.txt as such is meant to control 'crawlers'. spiders that walk the web.

A bot that goes to direct to a URL that it knows (or at least suspects) works is not a spider.

I don't know exactly what this "newspaper/0.3.0" is, but it might be

https://libraries.io/pypi/enlivensystems-newspaper3k

mentions 0.3.0.

Its upto crawlers if they choose to honour robots.txt. THere is no formal 'penalty' for not honouring it, even if you could prove this bot is a 'crawler'.

Frankly 5 requests an hour(or even 5/second) is hardly an 'attack'. A typical webserver will serve more pages than that just from one user visiting a page. (by the time images,css etc downloaded)

Would be hard-pressed to claim it constitutes some sort of DOS.

Adrian Hall 1076

Original Poster

Dec 30, 2021

I agree it isn't a DOS attack and I classify it as harassment, which is against Google's terms and conditions, as is site scraping.

Especially as it has been going on for more than 12 months when in all that time I have returned either 404, 403 or 500 errors.

All I can say is that it is pretty poor coding when it continues searching month after month for non-existent pages.

Adrian Hall 4074

Dec 30, 2021

Just checked my logs and what do you know, these stopped about 3 hours ago.

So either Google has finally come through and stopped these, thank you Google.

Or the person running the BOT finally spotted the HTML error returns.

Okay call me paranoid but I hate Crawlers, Bots and Site Scrapers.

Thanks Barry for listening to my grumbles.