Query guidelines and sample queries

Best practices for bulk data export queries

Always use aggregation functions

Data in the tables is not guaranteed to be consolidated by date, URL, site, or any combination of keys. Therefore you should always aggregate your fields (SUM, COUNT, and so on) to guarantee that your row values are consolidated. For example, a table might have several rows with the query "fat cats":

query     date       search_type   impressions ...
fat cats 10-10-2022      WEB          1000
fat cats 10-10-2022      WEB           200
fat cats 10-10-2022    IMAGE          5000

Searching simply for the query "fat cats", even for a specific date and search type, can result in multiple rows. So you should aggregate as follows to see the sum of your impressions:

/* BAD way to see most popular query: will get only 1 row per query */
SELECT query, impressions FROM searchconsole.searchdata_site_impression
ORDER BY 2 DESC LIMIT 1;

Result:
fat cats   5000

/* RIGHT way to see the most popular query */
SELECT query, SUM(impressions) FROM searchconsole.searchdata_url_impression
WHERE query != ''  /* Filter out anonymized queries */
GROUP BY query ORDER BY 2 DESC LIMIT 1;

Result:
fat cats   6200

Limit queries by date to save processing costs

Remember that when you run a query on BigQuery you will be charged for it, and your tables can become quite large. The tables exported are date-partitioned, which allows for efficient limiting of the input scan to only days of interest - for large sites this can translate into real savings in query cost. A good way to minimize query costs is to use a WHERE clause to limit the date range in the date partitioned table.

Filter out null query strings

Anonymized queries often make up the single most common query for a site. An anonymized query is reported as a zero-length string in the table. If you want to see the most popular query for specific criteria, you probably want to filter out zero-length query values from your SQL query.

Sample queries

Below are some sample queries to help you get up and running with the data.

Results by site

All web stats, by day (last two weeks)

SELECT
  data_date AS date,
  sum(impressions) AS impressions,
  sum(clicks) as clicks,
  sum(clicks) / sum(impressions) AS ctr,
  /* Added one below, because position is zero-based */
  ((sum(sum_top_position) / sum(impressions)) + 1.0) AS avg_position

FROM searchconsole.searchdata_site_impression
WHERE search_type = 'WEB'
  AND data_date between DATE_SUB(CURRENT_DATE(), INTERVAL 14 day) and CURRENT_DATE()
GROUP BY data_date

ORDER BY date desc
  LIMIT 1000

Top USA mobile web queries by clicks (last two weeks)

SELECT
  query,
  device,
  sum(impressions) AS impressions,
  sum(clicks) AS clicks,
  sum(clicks) / sum(impressions) AS ctr,
  ((sum(sum_top_position) / sum(impressions)) + 1.0) AS avg_position

FROM searchconsole.searchdata_site_impression
WHERE search_type = 'WEB'
    AND country = 'usa'
    AND device = 'MOBILE'
    AND data_date between DATE_SUB(CURRENT_DATE(), INTERVAL 14 day) and CURRENT_DATE()
GROUP BY 1,2
ORDER BY clicks
LIMIT 1000

Results by URL

Top Discover URLs by clicks (on a specific day)

SELECT
  url,
  sum(impressions),
  sum(clicks)
  FROM searchconsole.searchdata_url_impression
WHERE search_type = 'DISCOVER'
  AND data_date = DATE(2022, 02, 01)
GROUP BY url
ORDER BY 3
LIMIT 1000

Total query+url stats for FAQ rich results (last two weeks)

SELECT
  url,
  query,
  sum(impressions) AS impressions,
  sum(clicks) AS clicks,
  sum(clicks) / sum(impressions) AS ctr,
  ((sum(sum_position) / sum(impressions)) + 1.0) AS avg_position
FROM searchconsole.searchdata_url_impression
WHERE search_type = 'WEB'
  AND is_tpf_faq = true
  AND data_date between DATE_SUB(CURRENT_DATE(), INTERVAL 14 day) and CURRENT_DATE()
GROUP BY 1,2 
ORDER BY clicks
LIMIT 1000

All your traffic by URL (last two weeks)

SELECT
  url,
  search_type,
  sum(impressions) as impressions,
  sum(clicks) as clicks
FROM searchconsole.searchdata_url_impression
WHERE data_date between DATE_SUB(CURRENT_DATE(), INTERVAL 14 day) and CURRENT_DATE()
GROUP BY 1,2
ORDER BY 3 desc
LIMIT 1000

Impressions for queries containing "your_brand" (last two weeks)

SELECT
  regexp_contains(query, 'your_brand') AS contains_brand,
  sum(impressions) AS impressions,
  sum(clicks) AS clicks
FROM searchconsole.searchdata_url_impression
WHERE search_type = 'WEB'
  AND data_date between DATE_SUB(CURRENT_DATE(), INTERVAL 14 day) and CURRENT_DATE()
GROUP BY 1
ORDER BY 2 desc
LIMIT 1000

Was this helpful?

How can we improve it?
Search
Clear search
Close search
Google apps
Main menu