BigQuery cookbook

This feature is only available in Google Analytics 360, part of the Google Analytics 360 Suite.
Learn more about the Google Analytics 360 Suite.

This article contains examples of how to construct queries of the Analytics data you export to BigQuery. We have made available a sample dataset so you can practice with some of the queries in this article.

In this article:

Query optimization

Every query you run contributes to your monthly data-processing allowance. If you select extraneous fields, you increase the amount of data that needs to be processed, and as a result, you use more of your monthly allowance than necessary. Optimised queries make efficient use of your monthly data-processing allowance.

Learn more about pricing.

Select only what you need

When you formulate a query, select the relevant fields within the SELECT statement. By not calling extraneous fields, you reduce the amount of data, and the time needed to process the query.

Example: avoid the use of the wildcard operator

Bad form: using wildcard operator
SELECT *
FROM [table name];

 

Better form: using field names to avoid unnecessary processing
SELECT field1, field2
FROM [table name];

Permit caching

Where possible, avoid using functions as fields. Functions (such as NOW() or TODAY()) return variable results, which prevents queries from being cached and therefore returned more quickly. Instead, use specific times and dates.

Use intermediate tables for commonly used subqueries

If you find that you are repeatedly using a specific query as a subquery, you can save that query as an intermediate table by clicking Save as Table above the query results. You can then reference that table in the FROM section of your query, which will reduce both the amount of data that has to be processed and time required for processing.

using an intermediate table
SELECT field1, field2
FROM [Dataset name.table name];

Query debugging

BigQuery debugs your code as you construct it. In the composition window, debugging is indicated just below the query. Debugging is also available through the API with dryRun flag.

Valid queries have a green indicator that you can click to see the amount of data processed by the query. This feature gives you an opportunity to optimize your data before you run the query so that you can avoid unnecessary data processing.

Query Debugging - Success

 

Invalid queries have a red indicator that you can click to see information about the error, and find the line and column where the error is occurring. In the example below, the GROUP BY statement is blank, and the error is pinpointed.

Query Debugging - Error

 

Tips and best practices

Using the sample dataset

The following examples use the Google Analytics sample dataset.

To use the queries on your own data, simply replace the project and dataset names in the examples with your own project and dataset names.

Using Standard SQL vs. Legacy SQL

BigQuery supports two SQL dialects:

Migrating to Standard SQL explains the differences between the two dialects.

Standard SQL is now the preferred SQL dialect for querying data stored in BigQuery.

See Enabling Standard SQL for information about enabling Standard SQL in the BigQuery UI, CLI, API, or whichever interface you are using.

The easiest way to get started is to include the comment "standardSQL" at the top of your Standard SQL queries as shown in the following examples.

>With Legacy SQL, Google Analytics 360 data is passed into a new table everyday. To query multiple tables at once, you can comma separate the table names, use the TABLE_DATE_RANGE table wildcard function, or use multiple comma-separated TABLE_DATE_RANGE functions, as in the following examples.

Query multiple tables

The following examples show you Standard SQL and Legacy SQL queries for the same data.

3 days

Standard SQL
3 Days using UNION ALL
#standardSQL
WITH ga_tables AS (
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160801`
GROUP BY date

UNION ALL

SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160802`
GROUP BY date

UNION ALL

SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160803`
GROUP BY date

)
SELECT
date,
visits,
pageviews,
transactions,
revenue,
FROM ga_tables
ORDER BY date ASC
Legacy SQL
3 days using comma-separated table names
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM
[bigquery-public-data.google_analytics_sample.ga_sessions_20160801],
[bigquery-public-data.google_analytics_sample.ga_sessions_20160802],
[bigquery-public-data.google_analytics_sample.ga_sessions_20160803]
GROUP BY
date
ORDER BY
date ASC

 

Last 1095 Days

Standard SQL
Last 1095 Days using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1095 DAY))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date
ORDER BY date ASC
Legacy SQL
Last 1095 days using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -1095, 'DAY'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

 

Last 36 Months

Standard SQL
Last 36 Months using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 36 MONTH))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date
ORDER BY date ASC
Legacy SQL
Last 36 months using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -36, 'MONTH'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

 

Last 3 years

Standard SQL
Last 3 Years using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date
ORDER BY date ASC
Legacy SQL
Last 3 years using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

 

Specific date range

Standard SQL
Specific date range using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170731'
GROUP BY date
ORDER BY date ASC
Legacy SQL
Specific date range using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2016-08-01'), TIMESTAMP('2017-07-31')))
GROUP BY
date
ORDER BY
date ASC

Last 3 years plus today’s data (intraday)

Standard SQL
Last 3 years plus today’s data (intraday) using UNION ALL & _TABLE_SUFFIX
Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table.
#standardSQL
WITH ga_tables AS ( SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date
UNION ALL

SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
GROUP BY date
)
SELECT
date,
visits,
pageviews,
transactions,
revenue,
FROM ga_tables
ORDER BY date ASC

Legacy SQL

Last 3 years plus today’s data (intraday) using multiple TABLE_DATE_RANGE
Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table.
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))),
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_intraday_],
DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'), CURRENT_TIMESTAMP()))
GROUP BY
date
ORDER BY
date ASC

 

 

Basic query examples

This section explains how to construct basic queries using metrics and dimensions from example Analytics data.

Total [metric] per [dimension]?

Below are example scripts for the question: What is the total number of transactions generated per device browser in July 2017?

Standard SQL

Total transactions per device browser in July 2017
#standardSQL
SELECT
device.browser,
SUM ( totals.transactions ) AS total_transactions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
GROUP BY
device.browser
ORDER BY
total_transactions DESC

Legacy SQL

Total transactions per device browser in July 2017
SELECT
device.browser,
SUM ( totals.transactions ) AS total_transactions
FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
GROUP BY
device.browser
ORDER BY
total_transactions DESC

 

 

Average bounce rate per [ dimension ]?

The real bounce rate is defined as the percentage of visits with a single pageview. Below are example scripts for the question: What was the real bounce rate per traffic source?

Standard SQL

Bounce rate per traffic source in July 2017
#standardSQL
SELECT
source,
total_visits,
total_no_of_bounces,
( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate
FROM (
SELECT
trafficSource.source AS source,
COUNT ( trafficSource.source ) AS total_visits,
SUM ( totals.bounces ) AS total_no_of_bounces
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
GROUP BY
source )
ORDER BY
total_visits DESC

Legacy SQL

Bounce rate per traffic source in July 2017
SELECT
source,
total_visits,
total_no_of_bounces,
( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate
FROM (
SELECT
trafficSource.source AS source,
COUNT ( trafficSource.source ) AS total_visits,
SUM ( totals.bounces ) AS total_no_of_bounces
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
GROUP BY
source )
ORDER BY
total_visits DESC

 

 

Average number of product pageviews by purchaser type (purchasers vs non-purchasers)

Below are example scripts for the question: What was the average number of product pageviews for users who made a purchase in July 2017?

Standard SQL

Average number of product pageviews for users who made a purchase in July 2017
#standardSQL
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
totals.transactions >=1
GROUP BY
users )

Legacy SQL

Average number of product pageviews for users who made a purchase in July 2017
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE
totals.transactions >=1
GROUP BY
users )

 

 

Below are example scripts for the question: What was the average number of product pageviews for users who did not make a purchase in July 2017?

Standard SQL

Average number of product pageviews for users who did not make a purchase in July 2017
#standardSQL
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
totals.transactions IS NULL
GROUP BY
users )

Legacy SQL

Average number of product pageviews for users who did not make a purchase in July 2017
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE
totals.transactions IS NULL
GROUP BY
users )

 

 

Average number of transactions per purchaser

Below are example scripts for the question: What was the average total transactions per user that made a purchase in July 2017?

Standard SQL

Average number of transactions per user that made a purchase in July 2017
#standardSQL
SELECT
(SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM
SELECT
fullVisitorId,
SUM (totals.transactions) AS total_transactions_per_user
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND totals.transactions IS NOT NULL
GROUP BY
fullVisitorId )

Legacy SQL

Average number of transactions per user that made a purchase in July 2017
SELECT
(SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM
SELECT
fullVisitorId,
SUM (totals.transactions) AS total_transactions_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
totals.transactions IS NOT NULL
GROUP BY
fullVisitorId )

 

 

 

Average amount of money spent per session

Below are example scripts for the question: What is the average amount of money spent per session in July 2017?

Standard SQL

Average amount of money spent per session in July 2017
#standardSQL
SELECT
( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS
avg_revenue_by_user_per_visit
FROM (
SELECT
fullVisitorId,
SUM( totals.visits ) AS total_visits_per_user,
SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
totals.visits > 0
AND totals.transactions >= 1
AND totals.transactionRevenue IS NOT NULL
GROUP BY
fullVisitorId )

Legacy SQL

Average amount of money spent per session in July 2017
SELECT
( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS
avg_revenue_by_user_per_visit
FROM (
SELECT
fullVisitorId,
SUM( totals.visits ) AS total_visits_per_user,
SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
totals.visits > 0
AND totals.transactions >= 1
AND totals.transactionRevenue IS NOT NULL
GROUP BY
fullVisitorId )

 

 

Sequence of hits

Below are example scripts for the question: What is the sequence of pages viewed?.

Standard SQL

Sequence of pages viewed by users in July 2017
#standardSQL
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

Legacy SQL

Sequence of pages viewed by users in July 2017
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

In this query, you limit the hit types to PAGES to avoid seeing event or transaction interactions. Each line of the output represents a pageview, and is shown by the default order of the fields in the SELECT statement.

 

 

Multiple custom dimensions at the hit or session level

custom dimension at the hit level
SELECT fullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

 

custom dimension at the session level
SELECT fullVisitorId, visitId,
MAX(IF(customDimensions.index=2,
customDimensions.value,
NULL)) WITHIN RECORD AS customDimension2,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

In each query:

The SELECT statement queries for the relevant dimension and metric fields.

The MAX function:

  • Returns the custom dimension as a new column. You can repeat the function to return multiple custom dimensions as new columns.
  • WITHIN hits and WITHIN RECORD evaluate the condition inside repeated fields in BigQuery.
  • The condition inside MAX is evaluated for each custom dimension, but for any that are not index=1 (hits) or index=2 (sessions), it returns NULL.
  • Returns the maximum value, which is the value of Custom Dimension 1 for hits or Custom Dimension 2 for sessions since all other values are NULL.

Advanced query examples

Now that you’re familiar with simple queries, you can construct queries using the advanced functions and features available in BIgQuery.

Products purchased by customers who purchased product A (Classic Ecommerce)

Below is a skeleton script for the question: What other products are purchased by customers who purchase product A?

products purchased by a customer who purchases product A (Classic Ecommerce)
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM [‘Dataset Name’ ]
WHERE fullVisitorId IN (
  SELECT fullVisitorId
  FROM [‘Dataset Name’ ]
  WHERE hits.item.productName CONTAINS 'Product Item Name A'
   AND totals.transactions>=1
  GROUP BY fullVisitorId
)
 AND hits.item.productName IS NOT NULL
 AND hits.item.productName != 'Product Item Name A'
GROUP BY other_purchased_products
ORDER BY quantity DESC;
  1. In the first line, you select all of the other items purchased by a user, and the aggregate function COUNT() is used to calculate the quantity of each other item purchased. The result is then displayed in a field labelled quantity, with their associated item in the product field labelled as other_purchased_products.
  2. In the gray subquery, you select only the unique users (fullVisitorId) who have carried out transactions (totals.transactions>=1) and during a transaction purchased product A (WHERE hits.item.productName CONTAINS 'Product Item Name A').

The rules (WHERE and AND statements) in the top-level (green) query disregard values in hits.item.productNamem which are null and contain product A.

Here is an example of the query If a customer purchases Brighton Metallic Pens - Set of 4, what other product(s) did they purchase?

products purchased by a customer who purchases 'Brighton Metallic Pens (Set of 4)' on 24th June 2013
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM [GoogleStore.ga_sessions_20130624]
WHERE fullVisitorId IN (
  SELECT fullVisitorId
  FROM [GoogleStore.ga_sessions_20130624]
  WHERE hits.item.productName CONTAINS 'Brighton Metallic Pens - Set of 4'
   AND totals.transactions>=1
  GROUP BY fullVisitorId
)
 AND hits.item.productName IS NOT NULL
 AND hits.item.productName !='Brighton Metallic Pens - Set of 4'
GROUP BY other_purchased_products
ORDER BY quantity DESC;

In Dremel/BigQuery, using WHERE expr IN triggers a JOIN, and size restrictions apply; specifically, the size of the right side of the JOIN (in this case the number of visitors) needs to be less than 8 MB. In Dremel, this is called a broadcast JOIN. When the size exceeds 8 MB, you need to trigger a shuffled JOIN, which can be done using the JOIN EACH syntax. Unfortunately, it can't be done using IN, but the same query can be rewritten with a JOIN.

Products purchased by customers who purchased product A (Enhanced Ecommerce)

This is the similar to the previous skeleton query but works for Enhanced Ecommerce. It also makes use of TABLE_DATE_RANGE to query data over multiple days.

products purchased by a customer who purchases product A (Enhanced Ecommerce)
SELECT hits.product.productSKU AS other_purchased_products, COUNT(hits.product.productSKU) AS quantity
FROM SELECT fullVisitorId, hits.product.productSKU, hits.eCommerceAction.action_type FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14')))
WHERE fullVisitorId IN (
  SELECT fullVisitorId
FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14'))
WHERE hits.product.productSKU CONTAINS '10 15103'
AND hits.eCommerceAction.action_type = '6'
GROUP BY fullVisitorId
)
AND hits.product.productSKU IS NOT NULL
AND hits.product.productSKU !='10 15103'
AND hits.eCommerceAction.action_type = '6'
GROUP BY other_purchased_products
ORDER BY quantity DESC;

Average number of user interactions before purchase

This is an example of a query of a JOIN() [...] ON command, which is only dependent on Analytics data.

Below is a skeleton script for the question: What is the average number of user interactions before a purchase?

number of user interactions before a purchase
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number
FROM (
 SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number
 FROM [‘GA Dataset Name’ ]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku
) AS ‘Alias_Name_1’
JOIN (
 SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits
 FROM [‘GA Dataset Name’ ]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku
) AS ‘Alias_Name_2’
ON Alias_Name_1.hits.item.productSku = Alias_Name_2.hits.item.productSku;
  1. The first line carries out the main mathematical operation to find the average user interactions per product, and this query creates a join between two subqueries called ‘Alias_Name_1’ and ‘Alias_Name_2’.
  2. Alias_Name_1’ is used to produce a field that uses the aggregate function SUM() to sum all the hit numbers recorded for a product.
  3. Alias_Name_2’ is used to find the number of hits made by users per product, using the COUNT() function.
  4. The last line shows the common field (hits.item.productSku) shared between the two datasets on the join.

Here is an example of the query On 10th September 2013, what is the average number of user interactions before a purchase?

number of user interactions on 10th September 2013 before a purchase
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number
FROM (
 SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number
 FROM [GoogleStore.ga_sessions_20130728]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku 
) AS one
JOIN (
 SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits
 FROM [GoogleStore.ga_sessions_20130728]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku
) AS two
ON one.hits.item.productSku = two.hits.item.productSku;

Percentage of stock sold per product

This is an example of a query which is not only dependent on Analytics data, but also on non-Analytics data. By combining both data sets, you can begin to understand user behavior at a more segmented level. You can import non-Analytics data into BigQuery, but keep in mind that this will contribute to your monthly data storage charge.

Below is a skeleton script for the question: What percentage of stock was sold per product?

percentage of stock sold per product
SELECT AnalyticsImport.product_data_20130728.productId, ((( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold )) * 100 ) AS percentage_of_stock_sold
FROM [ ‘Imported_DataSet’ ]
JOIN (
  SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold
  FROM [‘GA Dataset’ ]
  WHERE hits.item.productSku IS NOT NULL
   AND totals.transactions>=1
  GROUP BY hits.item.productSku
) AS ‘Alias_Name’
ON Imported_DataSet.’productId_field’ = Alias_Name.hits.item.productSku;
  1. The first line outputs two fields: one field containing all the product IDs, and one that is a mathematical operation that shows the percentage of stock sold for that product ID.
  2. Since this query depends on two data sets, you need to use the JOIN() ... ON function. This command joins the rows from the two data sets based on the common field between them. In this case, the two data sets are [ ‘Imported_DataSet’ ] and ‘Alias_Name’.
  3. [ ‘Imported_DataSet’ ] is the non-Analytics data. This is the data set that contains the metric field for how much stock is left (Imported DataSet.’stock_left_field’) and the product-ID dimension field (Imported_DataSet.’productId_field’).
  4. ‘Alias_Name’ is the name assigned to the data returned by the gray subquery. This subquery is using Analytics data to find out the total quantity of items sold per product.
  5. The last line uses the ON statement to show the common field between the two data sets and where the two datasets are joined.

Many of the variables in this query have their dataset name attached to them as prefixes (e.g., Imported_DataSet.’productId_field’, Alias_Name.quantity_sold). This is to clarify what field you are selecting, and to make it explicit to which dataset it belongs.

Here is an example of the query What percentage of stock was sold per product on the 28th July 2013?

percentage of stock sold per product on 28th July 2013
SELECT AnalyticsImport.product_data_20130728.productId, ( ( ( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold ) ) * 100 ) AS percentage_of_stock_sold
FROM AnalyticsImport.product_data_20130728
JOIN (
  SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold
  FROM GoogleStore.ga_sessions_20130728
  WHERE hits.item.productSku IS NOT NULL
   AND totals.transactions>=1
  GROUP BY hits.item.productSku
) AS one
ON AnalyticsImport.product_data_20130728.productId = one.hits.item.productSku
ORDER BY percentage_of_stock_sold DESC;

Profitability of each product

Below is a skeleton script for the question: What is the profitability of each product?

profit by product
SELECT Alias_Name.hits.item.productSku, ( Imported_DataSet.’product profit field’ * Alias_Name.quantity ) AS profit
FROM (
  SELECT Alias_Name.hits.item.productSku, Imported_DataSet.’product profit field’
  FROM [ ‘Imported Data Set’ ]
  JOIN (
    SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
    FROM [ ‘GA Dataset Name’ ]
    WHERE hits.item.productSku IS NOT NULL
    GROUP BY hits.item.productSku
) AS ‘Alias_Name’
  ON Imported_DataSet.productId = Alias_Name.hits.item.productSku
);
  1. The first line contains the mathematical operation to calculate the total profit made on each product.
  2. The gray subquery uses non-Analytics data, which collects the data about how much profit is made when a product is sold.
  3. The red subquery is the Analytics data subquery, which will be joined with the non-Analytics data. It calculates the quantity of items sold per product.
  4. The last line uses the ON statement to clarify the field that the two data sets share. In this case, it is the product id number.

Here is an example of the query What was the profitability of each product on 28th July 2013?

profit by product on 28th July 2013
SELECT two.hits.item.productSku, ((AnalyticsImport.product_data_20130728.productprice-AnalyticsImport.product_data_20130728.productcost)*two.quantity) AS profit
FROM (
  SELECT two.hits.item.productSku, AnalyticsImport.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice
  FROM AnalyticsImport.product_data_20130728
  JOIN (
    SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
    FROM GoogleStore.ga_sessions_20130728
    WHERE hits.item.productSku IS NOT NULL
    GROUP BY hits.item.productSku
) AS two
  ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku
);

Profit is calculated by finding the difference between the price the product is sold at and the cost of manufacturing it. This information is stored on the non-GA dataset.

Real profitability of each product (taking into account refunds)

Below is a skeleton script for the question: What is the real profitability of each product?

real profit by product
SELECT Alias_Name.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit
FROM (
  SELECT Alias_Name.hits.item.productSku, ( ( Imported_DataSet.productprice - Imported_DataSet.productcost ) * Alias_Name.quantity ) AS gross_profit, ( ( Imported_DataSet.refunddeliveryprice + Imported_DataSet.productprice ) * Imported_DataSet.refundquantity ) AS total_refund_revenue
  FROM (

    SELECT Alias_Name.hits.item.productSku, Imported_DataSet.productcost, Alias_Name.quantity, Imported_DataSet.productprice, Imported_DataSet.refunddeliveryprice, Imported_DataSet.refundquantity
    FROM [ ‘Imported DataSet Name’ ] AS 'Imported_DataSet'
    JOIN (
      
SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
      FROM [‘GA Dataset Name’ ]
      WHERE hits.item.productSku IS NOT NULL
       AND totals.transactions >=1
      GROUP BY hits.item.productSku
) AS 'Alias_Name'
    ON Imported_DataSet.productId = Alias_Name.hits.item.productSku )
);
  1. This is a very similar query to What is the profitability of each product? The only differences are in the non-Analytics dataset in the gray subquery and in the mathematical operation calculating the real profit in the first line.
  2. In the non-Analytics dataset, you are also calculating the total amount of money spent on refunds (in the SELECT statement of the red subquery).
  3. Then you carry out a mathematical operation in line 1 to find the real profit by subtracting the revenue spent on refunds from your gross profit.

For more information on the query, please see the section on the profitability of each product.

Here is an example of the following query What was the real profitability of each product on the 28th July 2013?

real profit by product on 28th July 2013
SELECT two.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit
FROM (
  SELECT two.hits.item.productSku, ( ( AnalyticsImport.product_data_20130728.productprice - AnalyticsImport.product_data_20130728.productcost ) * two.quantity ) AS gross_profit, ( ( AnalyticsImport.product_data_20130728.refunddeliveryprice + AnalyticsImport.product_data_20130728.productprice ) * AnalyticsImport.product_data_20130728.refundquantity ) AS total_refund_revenue
  FROM (

    SELECT two.hits.item.productSku, Analytics.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice, AnalyticsImport.product_data_20130728.refunddeliveryprice, AnalyticsImport.product_data_20130728.refundquantity
    FROM AnalyticsImport.product_data_20130728
    JOIN (

      SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
      FROM GoogleStore.ga_sessions_20130728
      WHERE hits.item.productSku IS NOT NULL
       AND totals.transactions >=1
      GROUP BY hits.item.productSku
) AS two
    ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku )
);

The real profit takes into account the profitability of a product after taking refunded products into consideration. To calculate the total refund revenue for a product:

total refund revenue for a product = ( the price of the product + the refund delivery price for the product ) * the quantity of products refunded

Was this article helpful?
How can we improve it?