/customsearch/community?hl=en
/customsearch/community?hl=en
3/27/13
Original Poster
Nathan RCT

Adding filter annotations reduces total number of results significantly

OK, I know that the shortcomings of Custom Search compared to the standard web search have been discussed in the past.  However, it seems like if anything these issues are worsening.  Here is a concrete example.

Here is a search using a linked CSE to search the los angeles craigslist talent gigs section for 'voice' gigs:

https://www.google.com/cse?cref=http://gcrefs1.searchtempest.com/google_cse6.php%3FmainCat%3D3%26subcat%3Dtlg%26sortby%3Ddate%26location%3D90210%26maxDist%3D10%26region_can%3D%26region_us%3D1%26region_mex%3D&q=voice+-intitle:classifieds#gsc.tab=0&gsc.q=voice%20-intitle%3Aclassifieds&gsc.page=10

As of this writing it gives 85 results.

Uses this cref file, with one annotation:
http://gcrefs1.searchtempest.com/google_cse6.php?mainCat=3&subcat=tlg&sortby=date&location=90210&maxDist=10&region_can=&region_us=1&region_mex=

<BackgroundLabels>
<Label name="_y" mode="FILTER"/>
<Label name="recent4" mode="BOOST"/>
</BackgroundLabels>

<Annotations>
<Annotation about="losangeles.craigslist.org/*tlg*"><Label name="_y"/></Annotation>
</Annotations>

Now, say we increase our search distance, instead using this specification file, which searches up to 100 miles from LA:
http://gcrefs1.searchtempest.com/google_cse6.php?mainCat=3&subcat=tlg&sortby=date&location=90210&maxDist=100&region_can=&region_us=1&region_mex=

The file is identical, expect it includes a few more filter annotations:
<Annotations>
<Annotation about="losangeles.craigslist.org/*tlg*"><Label name="_y"></Label></Annotation>
<Annotation about="orangecounty.craigslist.org/*tlg*"><Label name="_y"></Label></Annotation>
<Annotation about="inlandempire.craigslist.org/*tlg*"><Label name="_y"></Label></Annotation>
<Annotation about="ventura.craigslist.org/*tlg*"><Label name="_y"></Label></Annotation>
<Annotation about="santabarbara.craigslist.org/*tlg*"><Label name="_y"></Label></Annotation>
<Annotation about="sandiego.craigslist.org/*tlg*"><Label name="_y"></Label></Annotation>
</Annotations>

Now we get 56 results:
https://www.google.com/cse?cref=http://gcrefs1.searchtempest.com/google_cse6.php%3FmainCat%3D3%26subcat%3Dtlg%26sortby%3Ddate%26location%3D90210%26maxDist%3D100%26region_can%3D%26region_us%3D1%26region_mex%3D&q=voice+-intitle:classifieds#gsc.tab=0&gsc.q=voice%20-intitle%3Aclassifieds&gsc.page=10

Obviously at the very LEAST we should get the same 85 results as above (but in reality there should be significantly more).

Let's expand the search a bit more.  This one goes out to 500 miles, resulting in 36 annotations:
http://gcrefs1.searchtempest.com/google_cse6.php?mainCat=3&subcat=tlg&sortby=date&location=90210&maxDist=500&region_can=&region_us=1&region_mex=

The result?
https://www.google.com/cse?cref=http://gcrefs1.searchtempest.com/google_cse6.php%3FmainCat%3D3%26subcat%3Dtlg%26sortby%3Ddate%26location%3D90210%26maxDist%3D500%26region_can%3D%26region_us%3D1%26region_mex%3D&q=voice+-intitle:classifieds#gsc.tab=0&gsc.q=voice%20-intitle%3Aclassifieds&gsc.page=10

*8* results returned.  This isn't just poor, it's insane.

If we expand to 1000 miles, we have 92 annotations:
http://gcrefs1.searchtempest.com/google_cse6.php?mainCat=3&subcat=tlg&sortby=date&location=90210&maxDist=1000&region_can=&region_us=1&region_mex=

Total results, 3:
https://www.google.com/cse?cref=http://gcrefs1.searchtempest.com/google_cse6.php%3FmainCat%3D3%26subcat%3Dtlg%26sortby%3Ddate%26location%3D90210%26maxDist%3D1000%26region_can%3D%26region_us%3D1%26region_mex%3D&q=voice+-intitle:classifieds#gsc.tab=0&gsc.q=voice%20-intitle%3Aclassifieds&gsc.page=10

Can someone at Google please explain what's going on here?  These results aren't just inconsistent with Web Search, they're completely internally illogical.  It appears that Custom Search is essentially useless with more than a single annotation.

Community content may not be verified or up-to-date. Learn more.
All Replies (7)
3/27/13
Original Poster
Nathan RCT
Oh, one more for good measure.  What if we just have a single annotation, *.craigslist.org? 
http://gcrefs1.searchtempest.com/google_nocse1.php

Well, that gives us 76 results: less than for losangeles.craigslist.org alone.

https://www.google.com/cse?cref=http://gcrefs1.searchtempest.com/google_nocse1.php&q=voice+-intitle:classifieds#gsc.tab=0&gsc.q=voice%20-intitle%3Aclassifieds&gsc.page=10
rohit1
4/1/13
rohit1
Nathan,
 
To get better result coverage you need to change the patterns to be prefix patterns i.e. of the form:
 
losangeles.craigslist.org/lac/tlg*,
losangeles.craigslist.org/svc/tlg*,
 
and so on.
 
Also, Linked CSE (cref based cue) have some inherent limitations as compared to regular cx CSEs so if your usage allows converting this to a cx based CSE you will s=get better results.
4/1/13
Original Poster
Nathan RCT
Hmm, interesting thoughts.  I haven't read either of those claims before.  Are they documented somewhere?

I just tried making a standard cx CSE of the first, 6-city example above:

https://www.google.com/cse/publicurl?cx=001747495066313166894:cg_nxwclwd4

When I perform the same search as above (voice -intitle:classifieds) I get 46 results (need to skip to the last page in all these cases, since the estimates are way off). 

The cref version is currently giving me 49 results: https://www.google.com/cse?cref=http://gcrefs1.searchtempest.com/google_cse6.php%3FmainCat%3D3%26subcat%3Dtlg%26sortby%3Ddate%26location%3D90210%26maxDist%3D100%26region_can%3D%26region_us%3D1%26region_mex%3D&q=voice+-intitle:classifieds#gsc.tab=0&gsc.q=voice%20-intitle%3Aclassifieds&gsc.page=10

The cref version that covers LA only (as opposed to 6 cities including LA) currently gives 71 results.  So making the CSE non-linked doesn't appear to make a difference here.  (Worrisome to think that it might ever though; the CSE documentation really pushes the benefits of linked CSEs, and they do offer a lot of added flexibility.)

I also made a test CSE to test the prefix-only idea: https://www.google.com/cse/publicurl?cx=001747495066313166894:5_l85rwtske
This one has those same 6 annotations, but no tlg portion.  Just

<Annotation about="losangeles.craigslist.org/*"><Label name="_y"></Label></Annotation>
<Annotation about="orangecounty.craigslist.org/*"><Label name="_y"></Label></Annotation>
<Annotation about="inlandempire.craigslist.org/*"><Label name="_y"></Label></Annotation>
<Annotation about="ventura.craigslist.org/*"><Label name="_y"></Label></Annotation>
<Annotation about="santabarbara.craigslist.org/*"><Label name="_y"></Label></Annotation>
<Annotation about="sandiego.craigslist.org/*"><Label name="_y"></Label></Annotation>


I then added inurl:tlg to the search query to achieve the same thing.  It returned the same number of results as the one with the wildcard annotations (although different results).  So unfortunately that doesn't appear to help either.  Thanks for the suggestions though.



On Sunday, March 31, 2013 10:36:58 PM UTC-7, rohit1 wrote:
Nathan,

To get better result coverage you need to change the patterns to be prefix patterns i.e. of the form:

losangeles.craigslist.org/lac/tlg*,
losangeles.craigslist.org/svc/tlg*,

and so on.

Also, Linked CSE (cref based cue) have some inherent limitations as compared to regular cx CSEs so if your usage allows converting this to a cx based CSE you will s=get better results.

4/15/13
Original Poster
Nathan RCT
This is still very much broken.  Any official response?
4/24/13
Original Poster
Nathan RCT
Can others confirm that they're seeing these same inconsistencies?  Perhaps that would elicit a response from Google.
Google user
4/24/13
Google user
Yes Nathan, I'm seeing the same inconsistent results.

My search engine results went from decent to very poor. I guess this is the new norm.
7/18/13
Original Poster
Nathan RCT
For anyone interested, since it appears this won't be fixed, I have found Yahoo BOSS to be the best alternative.  It runs on Bing's index, but is marginally less expensive, has more flexible payments and better terms of use, and most importantly has a more powerful interface.  Nothing like Google's annotation files, but those are rather useless now given that you can't get reasonable results from multiple domains anyway.  Unlike the Bing 'Azure' API, BOSS does at least have a 'sites' parameter that lets you specify multiple domains right in the REST query, and it seems to work pretty well, even for a couple hundred domains.
 
This question is locked and replying has been disabled. Still have questions? Ask the Help Community.

Badges

Some community members might have badges that indicate their identity or level of participation in a community.

 
Expert - Google Employee — Googler guides and community managers
 
Expert - Community Specialist — Google partners who share their expertise
 
Expert - Gold — Trusted members who are knowledgeable and active contributors
 
Expert - Platinum — Seasoned members who contribute beyond providing help through mentoring, creating content, and more
 
Expert - Alumni — Past members who are no longer active, but were previously recognized for their helpfulness
 
Expert - Silver — New members who are developing their product knowledge
Community content may not be verified or up-to-date. Learn more.

Levels

Member levels indicate a user's level of participation in a forum. The greater the participation, the higher the level. Everyone starts at level 1 and can rise to level 10. These activities can increase your level in a forum:

  • Post an answer.
  • Having your answer selected as the best answer.
  • Having your post rated as helpful.
  • Vote up a post.
  • Correctly mark a topic or post as abuse.

Having a post marked and removed as abuse will slow a user's advance in levels.

View profile in forum?

To view this member's profile, you need to leave the current Help page.

Report abuse in forum?

This comment originated in the Google Product Forum. To report abuse, you need to leave the current Help page.

Reply in forum?

This comment originated in the Google Product Forum. To reply, you need to leave the current Help page.