Jun 5, 2023

A lot of weired insite search url appear on Google Search Console

There are a HUGE amount of weired search url appear in my Google Search Console.
I stop the wordpress search function, now every search request will reponse 404.
But those url in Search Console just won't gone!

How could I deal with it?
Is there any tool to clear out all weired urls, and update index according to the Sitemap.xml we send?



Sincerely,
Kyle
Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
Recommended Answer
Jun 5, 2023
Well if they in the 'Not Found' section of the report, then Google has found the 404 status, and NOT indexed them anyway. 

As long as they not indexed, it's totally benign. (just annoying) 


Ie your server as 'protected' these URLs from indexing, by returning a 404 status to clients. 
Original Poster Kyle Tsai 5251 marked this as an answer
Helpful?
Recommended Answer
Jun 8, 2023
'Allow' is the default, so dont actually need to list the things that are allowed. 

You can have 'Allow', but might be clearer to simply list nothing. 
Original Poster Kyle Tsai 5251 marked this as an answer
Helpful?
Recommended Answer
Jun 8, 2023
alas yes, blocking in robots.txt is NOT recommended. 

Robots.txt only blocks crawling. It does not prevent indexing. 

So by blocking you are ironically allowing them to be indexed. 


Would highly recommend removing the block in robots.txt, and let the 404's take care of it. 
Original Poster Kyle Tsai 5251 marked this as an answer
Helpful?
Recommended Answer
Jun 5, 2023
Well if they in the 'Not Found' report - its in the 'not indexed' section. 

So google has ALREADY found the 404, it hasnt indexed the URL (if was indexed it already deindexed) 


So you don't need to do any further checks on the ones already in the 'Not Found' report. 

Similarly any others in the 'not indexed' area of the report, are already not indexed. 
Original Poster Kyle Tsai 5251 marked this as an answer
Helpful?
All Replies
Recommended Answer
Jun 5, 2023
Well if they in the 'Not Found' section of the report, then Google has found the 404 status, and NOT indexed them anyway. 

As long as they not indexed, it's totally benign. (just annoying) 


Ie your server as 'protected' these URLs from indexing, by returning a 404 status to clients. 
Original Poster Kyle Tsai 5251 marked this as an answer
Jun 8, 2023
Thats fine. Can do that. Just saying you dont need to name what is allowed. 

Could just do 

User-agent: Googlebot
Allow: /

or even 

User-agent: Googlebot
Disallow: 

... because the default is allow anyway. The empty Disallow is a 'null operation', as it blocks nothing. But it makes the Googlebot a separate block, so Googlebot has no rules, other bots follow *
Jun 8, 2023
OK! Thank you for explaining! :)
false
4972767703887019652
true
Search Help Center
true
true
true
true
true
83844
false
false
Search
Clear search
Close search
Main menu