Search
Clear search
Close search
Google apps
Main menu
true

BigQuery cookbook

This feature is only available in Google Analytics 360, part of the Google Analytics 360 Suite.
Learn more about the Google Analytics 360 Suite.

This article contains examples of how to construct queries of the Anlytics data you export to BigQuery. We have made available a sample dataset so you can practice with the queries in this article.

In this article:

Query optimization

Every query you run contributes to your monthly data-processing allowance. If you select extraneous fields, you increase the amount of data that needs to be processed, and as a result, you use more of your monthly allowance than necessary. Optimised queries make efficient use of your monthly data-processing allowance.

Learn more about pricing.

Select only what you need

When you formulate a query, select the relevant fields within the SELECT statement. By not calling extraneous fields, you reduce the amount of data, and the time needed to process the query.

Example: avoid the use of the wildcard operator

Bad form: using wildcard operator
SELECT *
FROM [table name];

 

Better form: using field names to avoid unnecessary processing
SELECT field1, field2
FROM [table name];

Permit caching

Where possible, avoid using functions as fields. Functions (such as NOW() or TODAY()) return variable results, which prevents queries from being cached and therefore returned more quickly. Instead, use specific times and dates.

Use intermediate tables for commonly used subqueries

If you find that you are repeatedly using a specific query as a subquery, you can save that query as an intermediate table by clicking Save as Table above the query results. You can then reference that table in the FROM section of your query, which will reduce both the amount of data that has to be processed and time required for processing.

using an intermediate table
SELECT field1, field2
FROM [Dataset name.table name];

Query debugging

BigQuery debugs your code as you construct it. In the composition window, debugging is indicated just below the query. Debugging is also available through the API with dryRun flag.

Valid queries have a green indicator that you can click to see the amount of data processed by the query. This feature gives you an opportunity to optimize your data before you run the query so that you can avoid unnecessary data processing.

Query Debugging - Success

Invalid queries have a red indicator that you can click to see information about the error, and find the line and column where the error is occurring. In the example below, the GROUP BY statement is blank, and the error is pinpointed.

Query Debugging - Error

 

Tips and best practices

Query multiple tables

Google Analytics 360 data is passed into a new table everyday. To query multiple tables at once, you can comma separate the table names, use the TABLE_DATE_RANGE table wildcard function, or use multiple comma-separated TABLE_DATE_RANGE functions, as in the following examples.

3 Days using comma separated table names
SELECT
date,
SUM (totals.visits) visits,
SUM (totals.pageviews) pageviews,
SUM (totals.transactions) transactions,
SUM (totals.transactionRevenue)/1000000 revenue
FROM [73156703.ga_sessions_20160327],[73156703.ga_sessions_20160328],[73156703.ga_sessions_20160329]
GROUP BY
date
ORDER BY
date ASC

 

Last 7 Days using TABLE_DATE_RANGE
SELECT
date,
SUM (totals.visits) visits,
SUM (totals.pageviews) pageviews,
SUM (totals.transactions) transactions,
SUM (totals.transactionRevenue)/1000000 revenue
FROM (TABLE_DATE_RANGE([73156703.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

 

Last 3 Months using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) visits,
SUM(totals.pageviews) pageviews,
SUM(totals.transactions) transactions,
SUM(totals.transactionRevenue)/1000000 revenue
FROM (TABLE_DATE_RANGE([73156703.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -3, 'MONTH'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

 

Last 1 Year using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) visits,
SUM(totals.pageviews) pageviews,
SUM(totals.transactions) transactions,
SUM(totals.transactionRevenue)/1000000 revenue
FROM (TABLE_DATE_RANGE([73156703.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

 

Specific Date Range
SELECT
date,
SUM(totals.visits) visits,
SUM(totals.pageviews) pageviews,
SUM(totals.transactions) transactions,
SUM(totals.transactionRevenue)/1000000 revenue
FROM (TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2016-01-01'), TIMESTAMP('2016-01-31')))
GROUP BY
date
ORDER BY
date ASC

 

Last 6 Days plus Today’s data (intraday) using multiple TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) visits,
SUM(totals.pageviews) pageviews,
SUM(totals.transactions) transactions,
SUM(totals.transactionRevenue)/1000000 revenue
FROM (TABLE_DATE_RANGE([73156703.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -6, 'DAY'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))), (TABLE_DATE_RANGE([73156703.ga_sessions_intraday_], DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'), CURRENT_TIMESTAMP()))
GROUP BY
date
ORDER BY
date ASC

Basic query examples

This section explains how to construct basic queries using metrics and dimensions from example Analytics data.

Total [metric] per [dimension]?

Below is a skeleton script for the question: What is the total [Insert Metric] per [Insert Dimension]?

total [metric] per [dimension]
SELECT SUM ( 'Insert Metric Field Name' ) AS ‘Alias Name’, [ ‘Insert Dimension Field Name’ ]
FROM [ ‘Dataset Name’ ]
GROUP BY [ ‘Insert Dimension Field Name’ ]
ORDER BY [ ‘Insert Field Name’ ] DESC;
  1. In the first line, you are selecting all of your relevant metric and dimension fields. You use an aggregate function, SUM() followed by the GROUP BY command, which is used to sum all the numerical values in the metric field for each specific dimension value within the dimension field.
  2. The keyword AS is used so that you can assign a name to a field. You can see this in line 1 where the keyword AS is used to assign the SUM ( 'Insert Metric Field Name' ) field a name (‘Alias Name’). This is useful when you want to name aggregated functions or rename complex field names for convenience in order to make your queries more pedagogical.
  3. In line 4, the command ORDER BY [Insert Field Name] DESC orders the field in descending order of magnitude and can be applied to all queries. You can also use the ORDER BY [Insert Field Name] ASC command, which orders the field in ascending order of magnitude. You can add this to any of the queries, but you can only order by one field and it can be either a metric (order by value) or dimension (order alphabetical).

    Example: What is the total number of transactions generated per device browser on 10th September 2013?
total transactions per browser on 10th September 2013
SELECT device.browser, SUM ( totals.transactions ) AS total_transactions
FROM LondonCycleHelmet.ga_sessions_20130910
GROUP BY device.browser
ORDER BY total_transactions;

Average bounce rate per [ dimension ]?

The real bounce rate is defined as the percentage of visits with a single pageview. Below is a skeleton script for the question: What is the average bounce rate per [Insert Dimension]?

average bounce rate per [dimension]
SELECT [ ‘Insert Dimension Field Name’ ] , ( ( total_no_of_bounces / ‘Alias Name’ ) * 100 ) AS bounce_rate
FROM (
  SELECT [ ‘Insert Dimension Field Name’ ], COUNT ( [ ‘Insert Dimension Field Name’ ]) AS ‘Alias Name’, COUNT ( totals.bounces ) AS total_no_of_bounces
  FROM [ ‘Dataset Name’ ]
GROUP BY [ ‘Insert Dimension Field Name’ ]
)
GROUP BY [ ‘Insert Dimension Field Name’ ] , bounce_rate ;
  1. In the first line, you select the dimension that you want to analyze, but you also carry out a mathematical operation. The mathematical operation calculates the bounce-rate-metric field, which is the percentage of sessions with a single pageview for each unique value within the dimension field.
  2. Lines in gray are commands to produce the dataset you need for the required fields in the first line. In summary, it outputs a field that shows the total number of bounces per value within the selected dimension field.
  3. The first aggregate function - COUNT() - in the gray SELECT statement statement is counting the number of times a specific value in the dimension field appears and the total number of bounces associated with it.
  4. The WHERE line in the gray subquery is only selecting values of the dimension field that have only had one pageview from a user.
  5. The last line uses a GROUP BY statement to group the dimension-field values with their respective bounce rates.

Example: What was the real bounce rate per traffic source?

average bounce rate per traffic source on 10th September 2013
SELECT trafficSource.source , ( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate
FROM (
  SELECT trafficSource.source , COUNT ( trafficSource.source ) AS total_visits, COUNT ( totals.bounces ) AS total_no_of_bounces
  FROM LondonCycleHelmet.ga_sessions_20130910
  GROUP BY trafficSource.source
)
GROUP BY trafficSource.source, bounce_rate;

Average number of product pageviews by purchaser type (purchasers vs non-purchasers)

Below is the skeleton script for: What is the average number of product pageviews for purchasers?

average number of product pageviews for users that make a purchase
SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM (
  SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user
  FROM [ ‘Dataset Name’ ]
  WHERE totals.transactions >=1
  GROUP BY users
) ;

Below is the skeleton script for: What is the average number of product pageviews for non-purchasers?

average number of product pageviews for users that do not make a purchase
SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user 
FROM (
  SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user
  FROM [ ‘Dataset Name’ ]
  WHERE totals.transactions IS NULL
  GROUP BY users
);
  1. The mathematical operation in the first line is used to calculate the average number of pageviews per user (avg_pageviews_per_user). The operation finds the total pageviews using the aggregate function SUM(), and then divides by the total amount of users who either converted or did not convert, which is calculated by using the aggregate function COUNT().
  2. The dataset is created in the gray subquery (returning two fields).
    1. One is a unique visitor field created by the GROUP BY fullVisitorId statement in the last line
    2. The other field is the total pageviews per a user which is created using the SUM(totals.pageviews) function in the first line of the gray subquery.

Be aware of the rule applied in the WHERE statement of the gray subquery. The rule changes depending on whether we are talking about users who converted or did not convert. To find out how many users did convert, use WHERE totals.transactions >=1. To find out how many users did not convert, use WHERE totals.transactions IS NULL.

Here is an example of the following query What was the average number of product pageviews for users that made a purchase on 10th September 2013?

average number of product pageviews for users that make a purchase on 10th September 2013
SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user 
FROM (
  SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user
  FROM LondonCycleHelmet.ga_sessions_20130910
  WHERE totals.transactions >=1
  GROUP BY users
);

Here is an example of the query What was the average number of product pageviews for users that did not make a purchase on 10th September 2013?

average number of product pageviews for users that do not make a purchase on 10th September 2013
SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user 
FROM (
  SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user
  FROM LondonCycleHelmet.ga_sessions_20130910
  WHERE totals.transactions IS NULL
  GROUP BY users
);

Once the table grows, it's likely that you'll need to use a GROUP EACH BY for grouping on columns like users where the number of distinct users will be very large.

Average number of transactions per purchaser

Below is the skeleton script for the question: What is the average number of transactions per purchaser?

average number of transactions per user that makes a purchase
SELECT (SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM (
  SELECT fullVisitorId, SUM(totals.transactions) AS total_transactions_per_user
  FROM [‘Dataset Name’ ]
  WHERE totals.transactions IS NOT NULL
  GROUP BY fullVisitorId
);
  1. The mathematical operation on the first line is used to calculate the average number of transactions per user (avg_total_transactions_per_user). The operation finds the total transactions carried-out by using the SUM() function and then dividing by the total number of users who made transactions, which is calculated by using the COUNT() function.
  2. The dataset is created in the gray subquery, and consists of two fields.
    1. One is a unique visitor field created by the GROUP BY fullVisitorId statement in the last line.
    2. The other is the total transactions carried out by each user field, which is created using the SUM(totals.transactions) function in the first line of the gray subquery.

Be aware of the rule applied in the WHERE statement of the gray subquery, which only chooses visitors who have made a transaction (WHERE totals.transactions IS NOT NULL).

Here is an example of the query What was the average total transactions per user that made a purchase on 10th September 2013?

average number of transactions per user that made a purchase on 10th September 2013
SELECT ( SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM (
  SELECT fullVisitorId, SUM(totals.transactions) AS total_transactions_per_user
  FROM LondonCycleHelmet.ga_sessions_20130910
  WHERE totals.transactions IS NOT NULL
GROUP BY fullVisitorId
);

Average amount of money spent per session

Below is the skeleton script for the question: What is the average amount of money spent per visit?

average amount of money spent per session
SELECT ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS avg_revenueSpend_by_user_per_visit
FROM (
  SELECT SUM(totals.visits) AS total_visits_per_user, SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user, fullVisitorId
  FROM [‘Dataset Name’ ]
  WHERE totals.visits>0
   AND totals.transactions>=1
   AND totals.transactionRevenue IS NOT NULL
  GROUP BY fullVisitorId
) ;
  1. They gray subquery produces the three data fields needed to carry out the mathematical operation in the first line, which calculates the average amount of money spent per session per user.
  2. In the subquery, you create three fields that show the the total amount of money spent per user (total_transactionrevenue_per_user), the total number of sessions per user (total_visits_per_user), and finally the associated visitor-ID field (fullVisitorId).
  3. These users must satisfy the three rules in the WHERE statementof the gray subquery (separated by AND statements). Each user must have initiated a session (WHERE totals.visits>0), made at least one transaction (AND totals.transactions>=1), and must have spent money (AND totals.transactionRevenue IS NOT NULL).

Here is an example of the query What is the average amount of money spent per session on 10th September 2013?

average amount of money spent per session on 10th September 2013
SELECT ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS avg_revenue_by_user_per_visit
FROM (
  SELECT SUM(totals.visits) AS total_visits_per_user, SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user, fullVisitorId
  FROM LondonCycleHelmet.ga_sessions_20130910
  WHERE totals.visits>0
   AND totals.transactions>=1
   AND totals.transactionRevenue IS NOT NULL
  GROUP BY fullVisitorId
) ;

Sequence of hits

Below is the skeleton script for an analysis of page sequence paths.

sequence of pages viewed
SELECT fullVisitorId, visitId, visitNumber, hits.hitNumber, hits.page.pagePath
FROM [ ‘Dataset Name’ ]
WHERE hits.type=[ ‘type of hit’ ];
  1. On the first line, you identify the unique combination of visitor ID and visit IDs. Every hit will belong to a unique combination of these two dimensions.
  2. On the first line, you also list the elements of a page path.
  3. The FROM statement lists the data source(s).
  4. In the WHERE statement, you limit the query to specific types of hits to control what kinds of interactions you observe.

Here is an example of the query What is the sequence of pages viewed? (or: how to do pathing analysis)

sequence of pages viewed by users on 10th September 2013
SELECT fullVisitorId, visitId, visitNumber, hits.hitNumber, hits.page.pagePath
FROM LondonCycleHelmet.ga_sessions_20130910
WHERE hits.type="PAGE";

In this query, you limit the hit types to PAGES to avoid seeing event or transaction interactions. Each line of the output represents a pageview, and is shown by the default order of the fields in the SELECT statement.

Multiple custom dimensions at the hit or session level

custom dimension at the hit level
SELECT fullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

 

custom dimension at the session level
SELECT fullVisitorId, visitId,
MAX(IF(customDimensions.index=2,
customDimensions.value,
NULL)) WITHIN RECORD AS customDimension2,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

In each query:

The SELECT statement queries for the relevant dimension and metric fields.

The MAX function:

 

  • Returns the custom dimension as a new column. You can repeat the function to return multiple custom dimensions as new columns.
  • WITHIN hits and WITHIN RECORD evaluate the condition inside repeated fields in BigQuery.
  • The condition inside MAX is evaluated for each custom dimension, but for any that are not index=1 (hits) or index=2 (sessions), it returns NULL.
  • Returns the maximum value, which is the value of Custom Dimension 1 for hits or Custom Dimension 2 for sessions since all other values are NULL.

Advanced query examples

Now that you’re familiar with simple queries, you can construct queries using the advanced functions and features available in BIgQuery.

Products purchased by customers who purchased product A (Classic Ecommerce)

Below is a skeleton script for the question: What other products are purchased by customers who purchase product A?

products purchased by a customer who purchases product A (Classic Ecommerce)
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM [‘Dataset Name’ ]
WHERE fullVisitorId IN (
  SELECT fullVisitorId
  FROM [‘Dataset Name’ ]
  WHERE hits.item.productName CONTAINS 'Product Item Name A'
   AND totals.transactions>=1
  GROUP BY fullVisitorId
)
 AND hits.item.productName IS NOT NULL
 AND hits.item.productName != 'Product Item Name A'
GROUP BY other_purchased_products
ORDER BY quantity DESC;
  1. In the first line, you select all of the other items purchased by a user, and the aggregate function COUNT() is used to calculate the quantity of each other item purchased. The result is then displayed in a field labelled quantity, with their associated item in the product field labelled as other_purchased_products.
  2. In the gray subquery, you select only the unique users (fullVisitorId) who have carried out transactions (totals.transactions>=1) and during a transaction purchased product A (WHERE hits.item.productName CONTAINS 'Product Item Name A').

The rules (WHERE and AND statements) in the top-level (green) query disregard values in hits.item.productNamem which are null and contain product A.

Here is an example of the query If a customer purchases Brighton Metallic Pens - Set of 4, what other product(s) did they purchase?

products purchased by a customer who purchases 'Brighton Metallic Pens (Set of 4)' on 24th June 2013
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM [GoogleStore.ga_sessions_20130624]
WHERE fullVisitorId IN (
  SELECT fullVisitorId
  FROM [GoogleStore.ga_sessions_20130624]
  WHERE hits.item.productName CONTAINS 'Brighton Metallic Pens - Set of 4'
   AND totals.transactions>=1
  GROUP BY fullVisitorId
)
 AND hits.item.productName IS NOT NULL
 AND hits.item.productName !='Brighton Metallic Pens - Set of 4'
GROUP BY other_purchased_products
ORDER BY quantity DESC;

In Dremel/BigQuery, using WHERE expr IN triggers a JOIN, and size restrictions apply; specifically, the size of the right side of the JOIN (in this case the number of visitors) needs to be less than 8 MB. In Dremel, this is called a broadcast JOIN. When the size exceeds 8 MB, you need to trigger a shuffled JOIN, which can be done using the JOIN EACH syntax. Unfortunately, it can't be done using IN, but the same query can be rewritten with a JOIN.

Products purchased by customers who purchased product A (Enhanced Ecommerce)

This is the similar to the previous skeleton query but works for Enhanced Ecommerce. It also makes use of TABLE_DATE_RANGE to query data over multiple days.

products purchased by a customer who purchases product A (Enhanced Ecommerce)
SELECT hits.product.productSKU AS other_purchased_products, COUNT(hits.product.productSKU) AS quantity
FROM SELECT fullVisitorId, hits.product.productSKU, hits.eCommerceAction.action_type FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14')))
WHERE fullVisitorId IN (
  SELECT fullVisitorId
FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14'))
WHERE hits.product.productSKU CONTAINS '10 15103'
AND hits.eCommerceAction.action_type = '6'
GROUP BY fullVisitorId
)
AND hits.product.productSKU IS NOT NULL
AND hits.product.productSKU !='10 15103'
AND hits.eCommerceAction.action_type = '6'
GROUP BY other_purchased_products
ORDER BY quantity DESC;

Average number of user interactions before purchase

This is an example of a query of a JOIN() [...] ON command, which is only dependent on Analytics data.

Below is a skeleton script for the question: What is the average number of user interactions before a purchase?

number of user interactions before a purchase
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number
FROM (
 SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number
 FROM [‘GA Dataset Name’ ]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku
) AS ‘Alias_Name_1’
JOIN (
 SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits
 FROM [‘GA Dataset Name’ ]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku
) AS ‘Alias_Name_2’
ON Alias_Name_1.hits.item.productSku = Alias_Name_2.hits.item.productSku;
  1. The first line carries out the main mathematical operation to find the average user interactions per product, and this query creates a join between two subqueries called ‘Alias_Name_1’ and ‘Alias_Name_2’.
  2. Alias_Name_1’ is used to produce a field that uses the aggregate function SUM() to sum all the hit numbers recorded for a product.
  3. Alias_Name_2’ is used to find the number of hits made by users per product, using the COUNT() function.
  4. The last line shows the common field (hits.item.productSku) shared between the two datasets on the join.

Here is an example of the query On 10th September 2013, what is the average number of user interactions before a purchase?

number of user interactions on 10th September 2013 before a purchase
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number
FROM (
 SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number
 FROM [GoogleStore.ga_sessions_20130728]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku 
) AS one
JOIN (
 SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits
 FROM [GoogleStore.ga_sessions_20130728]
 WHERE hits.item.productSku IS NOT NULL
  AND totals.transactions>=1
 GROUP BY hits.item.productSku
) AS two
ON one.hits.item.productSku = two.hits.item.productSku;

Percentage of stock sold per product

This is an example of a query which is not only dependent on Analytics data, but also on non-Analytics data. By combining both data sets, you can begin to understand user behavior at a more segmented level. You can import non-Analytics data into BigQuery, but keep in mind that this will contribute to your monthly data storage charge.

Below is a skeleton script for the question: What percentage of stock was sold per product?

percentage of stock sold per product
SELECT AnalyticsImport.product_data_20130728.productId, ((( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold )) * 100 ) AS percentage_of_stock_sold
FROM [ ‘Imported_DataSet’ ]
JOIN (
  SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold
  FROM [‘GA Dataset’ ]
  WHERE hits.item.productSku IS NOT NULL
   AND totals.transactions>=1
  GROUP BY hits.item.productSku
) AS ‘Alias_Name’
ON Imported_DataSet.’productId_field’ = Alias_Name.hits.item.productSku;
  1. The first line outputs two fields: one field containing all the product IDs, and one that is a mathematical operation that shows the percentage of stock sold for that product ID.
  2. Since this query depends on two data sets, you need to use the JOIN() ... ON function. This command joins the rows from the two data sets based on the common field between them. In this case, the two data sets are [ ‘Imported_DataSet’ ] and ‘Alias_Name’.
  3. [ ‘Imported_DataSet’ ] is the non-Analytics data. This is the data set that contains the metric field for how much stock is left (Imported DataSet.’stock_left_field’) and the product-ID dimension field (Imported_DataSet.’productId_field’).
  4. ‘Alias_Name’ is the name assigned to the data returned by the gray subquery. This subquery is using Analytics data to find out the total quantity of items sold per product.
  5. The last line uses the ON statement to show the common field between the two data sets and where the two datasets are joined.

Many of the variables in this query have their dataset name attached to them as prefixes (e.g., Imported_DataSet.’productId_field’, Alias_Name.quantity_sold). This is to clarify what field you are selecting, and to make it explicit to which dataset it belongs.

Here is an example of the query What percentage of stock was sold per product on the 28th July 2013?

percentage of stock sold per product on 28th July 2013
SELECT AnalyticsImport.product_data_20130728.productId, ( ( ( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold ) ) * 100 ) AS percentage_of_stock_sold
FROM AnalyticsImport.product_data_20130728
JOIN (
  SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold
  FROM GoogleStore.ga_sessions_20130728
  WHERE hits.item.productSku IS NOT NULL
   AND totals.transactions>=1
  GROUP BY hits.item.productSku
) AS one
ON AnalyticsImport.product_data_20130728.productId = one.hits.item.productSku
ORDER BY percentage_of_stock_sold DESC;

Profitability of each product

Below is a skeleton script for the question: What is the profitability of each product?

profit by product
SELECT Alias_Name.hits.item.productSku, ( Imported_DataSet.’product profit field’ * Alias_Name.quantity ) AS profit
FROM (
  SELECT Alias_Name.hits.item.productSku, Imported_DataSet.’product profit field’
  FROM [ ‘Imported Data Set’ ]
  JOIN (
    SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
    FROM [ ‘GA Dataset Name’ ]
    WHERE hits.item.productSku IS NOT NULL
    GROUP BY hits.item.productSku
) AS ‘Alias_Name’
  ON Imported_DataSet.productId = Alias_Name.hits.item.productSku
);
  1. The first line contains the mathematical operation to calculate the total profit made on each product.
  2. The gray subquery uses non-Analytics data, which collects the data about how much profit is made when a product is sold.
  3. The red subquery is the Analytics data subquery, which will be joined with the non-Analytics data. It calculates the quantity of items sold per product.
  4. The last line uses the ON statement to clarify the field that the two data sets share. In this case, it is the product id number.

Here is an example of the query What was the profitability of each product on 28th July 2013?

profit by product on 28th July 2013
SELECT two.hits.item.productSku, ((AnalyticsImport.product_data_20130728.productprice-AnalyticsImport.product_data_20130728.productcost)*two.quantity) AS profit
FROM (
  SELECT two.hits.item.productSku, AnalyticsImport.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice
  FROM AnalyticsImport.product_data_20130728
  JOIN (
    SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
    FROM GoogleStore.ga_sessions_20130728
    WHERE hits.item.productSku IS NOT NULL
    GROUP BY hits.item.productSku
) AS two
  ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku
);

Profit is calculated by finding the difference between the price the product is sold at and the cost of manufacturing it. This information is stored on the non-GA dataset.

Real profitability of each product (taking into account refunds)

Below is a skeleton script for the question: What is the real profitability of each product?

real profit by product
SELECT Alias_Name.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit
FROM (
  SELECT Alias_Name.hits.item.productSku, ( ( Imported_DataSet.productprice - Imported_DataSet.productcost ) * Alias_Name.quantity ) AS gross_profit, ( ( Imported_DataSet.refunddeliveryprice + Imported_DataSet.productprice ) * Imported_DataSet.refundquantity ) AS total_refund_revenue
  FROM (

    SELECT Alias_Name.hits.item.productSku, Imported_DataSet.productcost, Alias_Name.quantity, Imported_DataSet.productprice, Imported_DataSet.refunddeliveryprice, Imported_DataSet.refundquantity
    FROM [ ‘Imported DataSet Name’ ] AS 'Imported_DataSet'
    JOIN (
      
SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
      FROM [‘GA Dataset Name’ ]
      WHERE hits.item.productSku IS NOT NULL
       AND totals.transactions >=1
      GROUP BY hits.item.productSku
) AS 'Alias_Name'
    ON Imported_DataSet.productId = Alias_Name.hits.item.productSku )
);
  1. This is a very similar query to What is the profitability of each product? The only differences are in the non-Analytics dataset in the gray subquery and in the mathematical operation calculating the real profit in the first line.
  2. In the non-Analytics dataset, you are also calculating the total amount of money spent on refunds (in the SELECT statement of the red subquery).
  3. Then you carry out a mathematical operation in line 1 to find the real profit by subtracting the revenue spent on refunds from your gross profit.

For more information on the query, please see the section on the profitability of each product.

Here is an example of the following query What was the real profitability of each product on the 28th July 2013?

real profit by product on 28th July 2013
SELECT two.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit
FROM (
  SELECT two.hits.item.productSku, ( ( AnalyticsImport.product_data_20130728.productprice - AnalyticsImport.product_data_20130728.productcost ) * two.quantity ) AS gross_profit, ( ( AnalyticsImport.product_data_20130728.refunddeliveryprice + AnalyticsImport.product_data_20130728.productprice ) * AnalyticsImport.product_data_20130728.refundquantity ) AS total_refund_revenue
  FROM (

    SELECT two.hits.item.productSku, Analytics.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice, AnalyticsImport.product_data_20130728.refunddeliveryprice, AnalyticsImport.product_data_20130728.refundquantity
    FROM AnalyticsImport.product_data_20130728
    JOIN (

      SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
      FROM GoogleStore.ga_sessions_20130728
      WHERE hits.item.productSku IS NOT NULL
       AND totals.transactions >=1
      GROUP BY hits.item.productSku
) AS two
    ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku )
);

The real profit takes into account the profitability of a product after taking refunded products into consideration. To calculate the total refund revenue for a product:

total refund revenue for a product = ( the price of the product + the refund delivery price for the product ) * the quantity of products refunded

Was this article helpful?
How can we improve it?
Google Analytics training and support resources

Check out our comprehensive list to learn more about Analytics solutions.