Quality Data in BigQuery

Contents

Overview 

The same Trains & Buses quality data that is acessible through TAC's dashboards is also available via BigQuery. Accessing this dataset through BigQuery has the following key advantages:

  • More samples - Dashboards are limited to the last few days of samples, whereas BigQuery contains the last 90 days.
  • Programmatic access allows you to set up your own alerts or explore with your own dashboards.

 

Access 

To setup a Google Flights Quality data feed in Bigquery, please reach out to transport-help@google.com, expressing interest and mentioning the user emails that should have access to the data via Google Cloud Platform.

 

Please allow up to 1 week for us to set everything up. Once the data is ready, we will let you know, and you can access it through Analytics Hub with the following steps:

 

Prerequisite: You must have access to your Google Cloud project before beginning the following instructions. If you can not access your Google Cloud project, please contact your Google Cloud representative.

 

Step 1: Go to the google cloud console Analytics Hub and enable Analytics Hub (If not already enabled).

 

Step 2: Click on SEARCH LISTINGS

 

Step 3: Under Filters, select the Private checkbox. On the right panel you will see a private listing. Click on the private listing button.

 

Step 4: Click on ADD DATASET TO PROJECT. This will create a link for the transport quality dataset.

 

Step 5: Go to Google BigQuery and you will see a table named ft_transport_quality. This is a link to the table.

 

Step 6: Click on the ft_transport_quality table and then on the PREVIEW tab.  You will see some sample data.

 

Step 7: You can also use SQL to select this data and can join with other tables using standard SQL Join.

 

SELECT * FROM `<project-name>.<dataset>.ft_transport_quality` LIMIT 1000;

NOTE: Replace '<project-name> & <dataset> value accordingly.

 

The Dataset 

The data is updated hourly, and the whole dataset is replaced.

Data Model 

Name

Type

Description

timestamp

Timestamp

The time at which the sample was taken.

partner_id

String

Your identifier.

date

String

The day in which the sample was taken (America/Los_Angeles).

is_valid_partner_link

Bool

Whether the link was valid.
origin_country

String

The “ISO 3166-1 alpha-2“  country code for the trip’s origin.

destination_country

String

The “ISO 3166-1 alpha-2” country code for the trip’s destination.

origin_stop_id

String

The ‘origin’ stop upload via feed.

destination_stop_id

String

The ‘destination’ stop upload via feed.

user_country

String

The country code (2-letter ISO 3166-1 e.g 'US') of the user.

route

String

Concatenation of origin names and destination names (e.g. “Barcelona-Sants -> Seville Santa Justa Train Station”).

segments

Integer

The total number of segments (a segment represents the portion of an itinerary that is traveled on a specific vehicle).

segments_per_slice

Integer

The maximum number of segments per slice in the whole solution.

integration_id

String

Your integration identifier.

partner_price

String

The price + currency found on the partner website.

google_price

String

Sampled price + currency (from price cache or live API).

partner_live_api_price

String

Price from the live partner API call.

price_diff_usd

Float

The absolute value of the price difference in USD (always represented as positive).

time_to_departure_hours

Float

The number of hours until departure.

url

String

The link to the trip in Google Search.

partner_url

String

The link to the trip on your website.

price_diff_percentage

Float

The price difference percentage (always represented as positive number).

environment

String

How the sample was taken (e.g. 'Transport Unpriced Crawler', 'Transport Prod Crawler','Transport Qual Crawler').

is_domestic

Bool

Whether this is a domestic trip (all segments belong to the same country).

price_compare

String

A STRING containing either ‘Price Drop’, ‘Price Jump’ or ‘Equal’ or 'Close'. It can also contain the term ‘Error’.

error_type

String

The class of errors that his particular error belongs to (e.g. SOLUTION_NOT_FOUND).

error_message

String

The error explanation (e.g. “Shallow page returned”).

ranking

Integer

A measure of recency of the sample by error type (lower ranking is more recent).

departure_timestamp

Integer Timestamp of the departure.

arrival_timestamp

Integer Timestamp of the arrival.
departure_local_time_string

String

Local time of the departure (24h format, e.g “17:35”).
arrival_local_time_string String Local time of the arrival (24h format, e.g “17:35”).

departure_date

String

Departure date in format YYYY-MM-DD in the station timezone.

arrival_date

String

Arrival date in format YYYY-MM-DD in the station timezone.

origin_name

String The name of the trip’s origin (e.g. ‘Venice Mestre’).
destination_name String The name of the trip’s destination (e.g. ‘Verona Porta Nuova’).

shallow_partner_price

String

Price from the landing page (of a shallow link).

live_partner_price

String

Price from the live partner API call.

 

Metric Calculation

Below is an example on how to calculate metrics like 'itinerary not found' and 'price discrepancy'. Of course, you can also filter for specific dates, environments, or others. Also, please replace <namespace> with your own table name.

 

SELECT

  environment,

  1 - AVG(IF(is_valid_partner_link, 1, 0)) AS itinerary_not_found_percentage,

  IF(

    SUM(IF(is_valid_partner_link, 1, 0)) > 0,

    SUM(IF(is_valid_partner_link AND price_diff_percentage > 0.02, 1, 0))

      / SUM(IF(is_valid_partner_link, 1, 0)),

    0) AS price_discrepancy_percentage,

  COUNT(*) AS samples

FROM `<namespace>.ft_transport_quality`

GROUP BY environment;

Was this helpful?

How can we improve it?
Google apps
Main menu
13871983174031545035
true
Search Help Center
true
true
true
true
true
5099912
false
false
false
false