Contents
Overview
The same Trains & Buses quality data that is acessible through TAC's dashboards is also available via BigQuery. Accessing this dataset through BigQuery has the following key advantages:
- More samples - Dashboards are limited to the last few days of samples, whereas BigQuery contains the last 90 days.
- Programmatic access allows you to set up your own alerts or explore with your own dashboards.
Access
To setup a Google Flights Quality data feed in Bigquery, please reach out to transport-help@google.com, expressing interest and mentioning the user emails that should have access to the data via Google Cloud Platform.
Please allow up to 1 week for us to set everything up. Once the data is ready, we will let you know, and you can access it through Analytics Hub with the following steps:
Prerequisite: You must have access to your Google Cloud project before beginning the following instructions. If you can not access your Google Cloud project, please contact your Google Cloud representative.
Step 1: Go to the google cloud console Analytics Hub and enable Analytics Hub (If not already enabled).
Step 2: Click on SEARCH LISTINGS
Step 3: Under Filters, select the Private checkbox. On the right panel you will see a private listing. Click on the private listing button.
Step 4: Click on ADD DATASET TO PROJECT. This will create a link for the transport quality dataset.
Step 5: Go to Google BigQuery and you will see a table named ft_transport_quality. This is a link to the table.
Step 6: Click on the ft_transport_quality table and then on the PREVIEW tab. You will see some sample data.
Step 7: You can also use SQL to select this data and can join with other tables using standard SQL Join.
SELECT * FROM `<project-name>.<dataset>.ft_transport_quality` LIMIT 1000;
NOTE: Replace '<project-name> & <dataset> value accordingly.
The Dataset
The data is updated hourly, and the whole dataset is replaced.
Data Model
Name |
Type |
Description |
timestamp |
Timestamp |
The time at which the sample was taken. |
partner_id |
String |
Your identifier. |
date |
String |
The day in which the sample was taken (America/Los_Angeles). |
is_valid_partner_link |
Bool |
Whether the link was valid. |
origin_country |
String |
The “ISO 3166-1 alpha-2“ country code for the trip’s origin. |
destination_country |
String |
The “ISO 3166-1 alpha-2” country code for the trip’s destination. |
origin_stop_id |
String |
The ‘origin’ stop upload via feed. |
destination_stop_id |
String |
The ‘destination’ stop upload via feed. |
user_country |
String |
The country code (2-letter ISO 3166-1 e.g 'US') of the user. |
route |
String |
Concatenation of origin names and destination names (e.g. “Barcelona-Sants -> Seville Santa Justa Train Station”). |
segments |
Integer |
The total number of segments (a segment represents the portion of an itinerary that is traveled on a specific vehicle). |
segments_per_slice |
Integer |
The maximum number of segments per slice in the whole solution. |
integration_id |
String |
Your integration identifier. |
partner_price |
String |
The price + currency found on the partner website. |
google_price |
String |
Sampled price + currency (from price cache or live API). |
partner_live_api_price |
String |
Price from the live partner API call. |
price_diff_usd |
Float |
The absolute value of the price difference in USD (always represented as positive). |
time_to_departure_hours |
Float |
The number of hours until departure. |
url |
String |
The link to the trip in Google Search. |
partner_url |
String |
The link to the trip on your website. |
price_diff_percentage |
Float |
The price difference percentage (always represented as positive number). |
environment |
String |
How the sample was taken (e.g. 'Transport Unpriced Crawler', 'Transport Prod Crawler','Transport Qual Crawler'). |
is_domestic |
Bool |
Whether this is a domestic trip (all segments belong to the same country). |
price_compare |
String |
A STRING containing either ‘Price Drop’, ‘Price Jump’ or ‘Equal’ or 'Close'. It can also contain the term ‘Error’. |
error_type |
String |
The class of errors that his particular error belongs to (e.g. SOLUTION_NOT_FOUND). |
error_message |
String |
The error explanation (e.g. “Shallow page returned”). |
ranking |
Integer |
A measure of recency of the sample by error type (lower ranking is more recent). |
departure_timestamp |
Integer | Timestamp of the departure. |
arrival_timestamp |
Integer | Timestamp of the arrival. |
departure_local_time_string |
String |
Local time of the departure (24h format, e.g “17:35”). |
arrival_local_time_string | String | Local time of the arrival (24h format, e.g “17:35”). |
departure_date |
String |
Departure date in format YYYY-MM-DD in the station timezone. |
arrival_date |
String |
Arrival date in format YYYY-MM-DD in the station timezone. |
origin_name |
String | The name of the trip’s origin (e.g. ‘Venice Mestre’). |
destination_name | String | The name of the trip’s destination (e.g. ‘Verona Porta Nuova’). |
shallow_partner_price |
String |
Price from the landing page (of a shallow link). |
live_partner_price |
String |
Price from the live partner API call. |
Metric Calculation
Below is an example on how to calculate metrics like 'itinerary not found' and 'price discrepancy'. Of course, you can also filter for specific dates, environments, or others. Also, please replace <namespace> with your own table name.
SELECT
environment,
1 - AVG(IF(is_valid_partner_link, 1, 0)) AS itinerary_not_found_percentage,
IF(
SUM(IF(is_valid_partner_link, 1, 0)) > 0,
SUM(IF(is_valid_partner_link AND price_diff_percentage > 0.02, 1, 0))
/ SUM(IF(is_valid_partner_link, 1, 0)),
0) AS price_discrepancy_percentage,
COUNT(*) AS samples
FROM `<namespace>.ft_transport_quality`
GROUP BY environment;