/webmasters/community?hl=en
/webmasters/community?hl=en
3/15/15
Original Poster
flyfishingcolorado

googlebot access to wp-content folder

The problem I have with letting Googlebot run rampant through my plugins and themes is they have a tendency to end up in the index for any hacker to exploit. Too many times in the past I have found different plugins and themes listed in the Google index when I do a site search. So until Google stops indexing what it finds in the wp-content folder on wordpress sites, I will go for security rather than a happy Googlebot. thank you.

Community content may not be verified or up-to-date. Learn more.
All Replies (14)
Gaieus
3/15/15
Gaieus
On Sunday, March 15, 2015 at 2:10:47 PM UTC+1, flyfishingcolorado wrote:
The problem I have with letting Googlebot run rampant through my plugins and themes is they have a tendency to end up in the index for any hacker to exploit. Too many times in the past I have found different plugins and themes listed in the Google index when I do a site search. So until Google stops indexing what it finds in the wp-content folder on wordpress sites, I will go for security rather than a happy Googlebot. thank you.

You're welcome. However there are smarter ways than simply disallowing an entire directory. I am not sure however if you are interested. 
3/15/15
Original Poster
flyfishingcolorado
Yes I am interested.
cristina
3/15/15
cristina
First check if URLs of folders without index files return status 200 OK with the folder structure and links to all files and sub-folders. If they do, you need to return status 403 Forbidden for URLs of folders without index files. This can be easily done, for example in .htaccess or in an admin panel like cPanel.
 
3/15/15
Original Poster
flyfishingcolorado
So you're saying use htaccess to manage access to the wp-content folders instead of robots.txt.  Will I then see a lot of 403 crawl error messages on my webmaster tools dashboard?
Gaieus
4/19/15
Gaieus
On Sunday, March 15, 2015 at 2:24:12 PM UTC+1, flyfishingcolorado wrote:
Yes I am interested.

All right... 

The rule is that the rule defined latter prevails and the more special rule also prevails. So you can disallow the whole wp-conmtent directory but then allow certain parts or certain file types to crawl. Something like this: 
User-agent: *
Disallow: /wp-content/
Allow: /wp-content/uploads/
Allow: /wp-content/plugins/yourplugin/*.js
Allow: /wp-content/plugins/yourplugin/*.css
You'd obviously want to get you images indexed for instance, don't you? They are in the /wp-content/uploads/ folder. Also you can allow (only) javascript and css files to crawl the way above. Not every plugin needs this; only ones that add some renderable content to your pages (like an accordion plugin or a slide show plugin). 

The same way you may wish to allow js and css files to be crawled in your /wp-includes/ directory. Read more about the robots.txt file here: 
There are very useful examples on this page...

In case you are worried that some of these files get indexed, you can place this code into your site's main .htaccess file (or into a .htaccess file placed directly into these directories)
<Files ~ "\.(css|js)";>
Header set X-Robots-Tag "noindex"
</Files>

Read about the x-robots tag somewhere at the bottom of this page: 

Edit:

On Sunday, March 15, 2015 at 3:01:20 PM UTC+1, flyfishingcolorado wrote:
So you're saying use htaccess to manage access to the wp-content folders instead of robots.txt.  Will I then see a lot of 403 crawl error messages on my webmaster tools dashboard? 
No, it would only disallow directory indexing (i.e. allowing bots as well of humans to see the content of your folders). All the files inside would still be accessible. 
cristina
3/15/15
cristina
Gaieus already explained, but since I think you asked about .htaccess after what I wrote about 403, I will explain as well.
There are two separate things, blocking stuff in robots.txt , and returning status 403 Forbidden for URL of folders without index files.
Look at a folder URL without index fike, for example /images/ or /wp-content/ , if you see a list with links of all files and sub-folders, this is where Google and other search engines find and collect URLs that you think cannot be found.
If the folder URL returns status 403 (you can see the status with Fetch as Googlebot), then Google will not index or process the URL, it will just see status 403 and go crawling other URLs. But this is if the URL is not blocked in robots.txt, if it is blocked Google cannot see the status 403. If there are web crawl errors 403 in Webmaster Tools, you can see where the URL is linked from and fix the link. 
  
If you block folders in robots.txt , take care not to block URLs that Google needs in rendering content of your site that you want indexed in search results, 
or make sure the rendering is not affected in important ways.

3/15/15
Original Poster
flyfishingcolorado
Thanks for the help. This is useful information. I have not messed with robots.txt files for a very long time. Long enough that code like you are showing was not allowed. as I remember. It looks a lot like htaccess code at the end. I had experimented with allowing access to the theme files I was using and watched the effect on the mobile tester. But I like your way better of allowing access to just css and js as needed. I have an extensive filtered htaccess file. Will have to see how to work that into the filter rules.
3/15/15
Original Poster
flyfishingcolorado
Thanks for the information. I will pass on to my developer.
One more question. If I am on shared hosting. If I run a master robots.txt for all my add-on domain sites, won't I see messages on webmaster tools about not having a robots.txt in each add-on site?  Or would the main robots.txt file have to contain code blocks for each add-on site in my account?
3/15/15
Original Poster
flyfishingcolorado
Gaieus, one more question. Does this code pertain only go Googlebot or will other major bots like bingbot honor it also?
Gaieus
3/15/15
Gaieus
The robots.txt file should be unique to each domain and reside in its direct root folder. AFAIK any addon domain usually has its own folder (as a root) so these files should not interfere with each other. If a given site does not have a robots.txt file, it is okay as long as the server responds with a true 404 status code. Please, read: 
Gaieus
3/15/15
Gaieus
The robots.txt protocol is generally supported in all major (and ethical) search engines. There might be directives that one or another does not support but they all support the basics. 

Same with x-robots tags. 

They actually have common conferences/meetings in order to make their own lives easier as well.
3/16/15
Original Poster
flyfishingcolorado
Gaieus > I have implemented your suggestion on this site www.pixlerproductions.com/  Yet it is still mobile unfriendly unless I list 16 more files googlebot needs to totally render the page. Now googlebot is the only one having a problem. I see the page fine on my S5 phone, my tablet and on testing the way weaver developer says to test for the theme mobile settings.  So am I supposed to spend hours a day managing a 200+ robots.txt file just to make googlebot happy. Surely there must be an easier way to do this.

I could open up the wp-includes folder of course. along with the total wp-content folder. Then put an htaccess no index file in each folder. If I did that how do I know Googlebot or any other major bot would honor this. Doing it this way defeats all of the grand filters in my main htaccess file I would think.
Gaieus
3/16/15
Gaieus
No, you still block the whole wp-content directory (except uploads) and the wp-include directory. Important js and css files are here which allow responsiveness. 

4/19/15
Original Poster
flyfishingcolorado
If you check your htaccess code, you left off the $ after the ) to signify the end of the Regex expression.  That missing $ broke my site for almost a full day before I figured out what was causing the problem.
Were these replies helpful?
How can we improve them?
 
This question is locked and replying has been disabled. Still have questions? Ask the Help Community.

Badges

Some community members might have badges that indicate their identity or level of participation in a community.

 
Expert - Google Employee — Googler guides and community managers
 
Expert - Community Specialist — Google partners who share their expertise
 
Expert - Gold — Trusted members who are knowledgeable and active contributors
 
Expert - Platinum — Seasoned members who contribute beyond providing help through mentoring, creating content, and more
 
Expert - Alumni — Past members who are no longer active, but were previously recognized for their helpfulness
 
Expert - Silver — New members who are developing their product knowledge
Community content may not be verified or up-to-date. Learn more.

Levels

Member levels indicate a user's level of participation in a forum. The greater the participation, the higher the level. Everyone starts at level 1 and can rise to level 10. These activities can increase your level in a forum:

  • Post an answer.
  • Having your answer selected as the best answer.
  • Having your post rated as helpful.
  • Vote up a post.
  • Correctly mark a topic or post as abuse.

Having a post marked and removed as abuse will slow a user's advance in levels.

View profile in forum?

To view this member's profile, you need to leave the current Help page.

Report abuse in forum?

This comment originated in the Google Product Forum. To report abuse, you need to leave the current Help page.

Reply in forum?

This comment originated in the Google Product Forum. To reply, you need to leave the current Help page.