Sep 19, 2020

Google cannot access robots.txt

Recently Googlebot seems to have problems to index new pages on my wife's website. When I try to use the Mobile Friendly Test Tool, it fails too. Just like the robots.txt Tester Tool from Google. None of them can access the site. But if i use any other tester tool, or open the url of robots.txt myself, it has no problem. So it seems the problem only exists if Google is trying to access it.
I have also contacted the hosting company (Dreamhost), but they said they didnt find any problem.

The site uses WordPress, and a plugin called All in one SEO to create the robots.txt. I have already tried disabling it and creating the robots.txt myself, but nothing changed.
I have already searched, and found others had similar problems before, but didnt really find any clear solution.
Oh, the site is https://screenpotatoes.com

Anyone with any suggestions or any idea what might cause this?
Thank you!
Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
All Replies (6)
Sep 19, 2020
Hi 
I see indeed the same message here.
I don't see a problem with your robots.txt

So I think something else is blocking Googlebot from accessing your site
Check Firewalls, Security plugins, CDNs, DNS, htaccess,...
Also check the server logs.
You might  contact your webhost to check at their side.
Last edited Sep 19, 2020
Sep 19, 2020
Hello,

Thanks for your response! I have checked some of your suggestions.
htaccess seems to be generated by Wordpress, but i think it seems ok.

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

I have checked some logs on the server, in access.log there are some GET requests by Googlebots, the last one about 30 minutes ago, it tried to read robots.txt

66.249.64.213 - - [19/Sep/2020:06:31:26 -0700] "GET /robots.txt HTTP/1.1" 301 3776 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 

The IP also seems to be from Google. But still if i try to access robots.txt on Google's robots.txt tester, it fails.

The error.log file seems interesting, starting yesterday it has the following error every minute:
[Sat Sep 19 06:53:32.551089 2020] [ssl:error] [pid 17037:tid 3904586639104] AH01941: stapling_renew_response: responder error

If im correct this is an error message from Apache, but i dont know if it is related to my problem.

I created a subdomain on the same server, and the subdomain is accessible from the Mobily Friendly Tester, but the main domain is not.
Sep 19, 2020
Hi there,

You are having issues with sever 5xx level errors with ssl redirect

[Sat Sep 19 06:53:32.551089 2020] [ssl:error] [pid 17037:tid 3904586639104] AH01941: stapling_renew_response: responder error

Thats an SSL Error - Google bot is trying to find the redirect version and raising 500 error at server level.
 Check this to fix this error:



Status Code URL IP Page Type Redirect Type Redirect URL..
301 http://www.screenpotatoes.com/2019/12/06/when-they-see-us-e3/ 173.236.190.171 server_redirect permanent http://screenpotatoes.com/2019/12/06/when-they-see-us-e3/
200 http://screenpotatoes.com/2019/12/06/when-they-see-us-e3/ 173.236.190.171 normal none none
All version of your website should redirect to httpS version.
Sep 19, 2020
Hi
htaccess looks fine. These are indeed the standard Wordpress rewrite rules

the log shows a 301 redirect; do you see 200 OK statusses ?

Is anything blocked by the firewall ? 
Do you hace any Security plugins in Wordpress ?
Do you use any CDN's
Sep 19, 2020
Hello,

Yes, i also see 200 statuses, the most recent:
34.83.190.242 - - [19/Sep/2020:13:20:12 -0700] "GET /robots.txt HTTP/2.0" 200 300 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.0.0 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 

And if I check the IP with host, it shows this:
242.190.83.34.in-addr.arpa domain name pointer 242.190.83.34.bc.googleusercontent.com.
I dont know if googleusercontent.com is really a Googlebot or not.

I see other Googlebot agents in the log(IP: 66.249.64.213), trying to access robots.txt which seem to be real ones, because host shows
213.64.249.66.in-addr.arpa domain name pointer crawl-66-249-64-213.googlebot.com.
But these always get redirected with 301, though other search engines get 200 for robots.txt.

I think there is no firewall that i can access, i dont see anything after logging in to Dreamhost. I can ssh into the server, i see it has fail2ban installed, but it doesnt allow me to see details about it, because im not root, probably only Dreamhost support can do that.

Plugins that are relevant to security:
-Akismet Anti-Spam (stops spam posts)
-All in one SEO Pack (seems to do a lot of SEO related things, also manages robots.txt)
-Jetpack (this seems to have CDN, but only for images)

Other plugins are not related to security, they are about statistics, health check and ad management.
Maybe i can try to disable these plugins and see if it makes any change.
Sep 19, 2020
Hi
I would check also with your webhost.
They can heck things from their side.
Certainly the firewalls because in these cases it is a firewall that blocks googlebot or IP addresses
Last edited Sep 19, 2020
false
562207408248451420
true
Search Help Center
true
true
true
true
true
83844
Search
Clear search
Close search
Main menu
false
false