Google user
Original Poster
Dec 1, 2020

Robots.txt Directives

Hi & thank to anyone who can answer this for me:

We have an issue in which we want Googlebot to stop crawling UTM parameters, however, when we blocked the UTM parameters with the 'Disallow' directive in our robots.txt file, it meant Adsbot could not crawl our UTMs either. Adsbot then was unable to crawl the destination URLs meaning our ads got disapproved. 

We need to be able to allow adsbot to crawl UTMs in the robots.txt file but obviously disallow Googlebot. 

This is not as simple as it sounds because, when we add a separate group calling out Adsbot specifically Eg. User-agent: AdsBot-Google

It means that Adsbot ignores all of the other directives that came before it with the wild card *. E.g.
User-agent: *

Is there a way to go about allowing adsbot to crawl urls but blocking Googlebot?

Thanks
Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
Recommended Answer
Dec 1, 2020
Hi @KJK123

i suppose the simplest way, if having a User-Agent:AdsBot-Google means that the global wild card settings are being ignored, would be to replicate the elements you want included within the wild card, to this specific bot. I get that might be messy though (and difficult to maintain, if you have multiple entries).

If your server runs Apache, you could possibly look at doing this (with caution)  via your .htaccess file.  That way,  you might be able to block this specific bot in a more targeted way (and without having to specify multiple directories to ignore etc).  

Do a web search for "blocking bots with .htaccess" (forgive me if I'm teaching you to suck eggs).

If you do consider the later route, make sure you test it well as a simple typo can cause havoc.
Last edited Dec 1, 2020
Original Poster Google user marked this as an answer
Helpful?
All Replies
Recommended Answer
Dec 1, 2020
Hi @KJK123

i suppose the simplest way, if having a User-Agent:AdsBot-Google means that the global wild card settings are being ignored, would be to replicate the elements you want included within the wild card, to this specific bot. I get that might be messy though (and difficult to maintain, if you have multiple entries).

If your server runs Apache, you could possibly look at doing this (with caution)  via your .htaccess file.  That way,  you might be able to block this specific bot in a more targeted way (and without having to specify multiple directories to ignore etc).  

Do a web search for "blocking bots with .htaccess" (forgive me if I'm teaching you to suck eggs).

If you do consider the later route, make sure you test it well as a simple typo can cause havoc.
Last edited Dec 1, 2020
Original Poster Google user marked this as an answer
false
14292200487643149574
true
Search Help Center
true
true
true
true
true
83844
Search
Clear search
Close search
Main menu
false
false