Example queries for reporting logs in BigQuery

This feature is available with Cloud Identity Premium edition. Compare editions 

This article contains example queries for common reports you can get from BigQuery. These query examples assume legacy SQL. Replace api_project_name.dataset_name with your own project name and dataset name.

Learn more about querying BigQuery data.

For Gmail log fields and their meanings, go to Schema for Gmail activity logs in BigQuery.

Example queries

Accounts

Number of admin and delegated accounts, and number disabled, locked out, and suspended by date accounts

SELECT date,
accounts.num_locked_users,
accounts.num_disabled_accounts,
accounts.num_delegated_admins,
accounts.num_super_admins,
accounts.num_suspended_users,
accounts.num_users
FROM api_project_name.dataset_name.usage
WHERE accounts.num_users IS NOT NULL
ORDER BY date ASC;

Administrators

Most frequent events performed by an admin

SELECT count(*) as admin_actions, event_name
FROM api_project_name.dataset_name.activity
WHERE email IN (
  SELECT user_email
  FROM api_project_name.dataset_name.usage
  WHERE accounts.is_super_admin = TRUE
)
GROUP BY 2
ORDER BY 1 DESC;

Find the number of super admins in a given domain

SELECT COUNT(DISTINCT user_email) as number_of_super_admins, date
FROM api_project_name.dataset_name.usage
WHERE accounts.is_super_admin = TRUE
GROUP BY 2
ORDER BY 2 DESC;

Calendar

Standard SQL only

Ratio of daily active users to 30-day active users in Google Calendar. This example queries across multiple tables.

Daily active users

SELECT date, calendar.num_1day_active_users
FROM api_project_name.dataset_name.usage
WHERE calendar.num_1day_active_users IS NOT NULL
ORDER BY date DESC

30-day active users

SELECT date, calendar.num_30day_active_users
FROM api_project_name.dataset_name.usage
WHERE calendar.num_30day_active_users IS NOT NULL
ORDER BY date DESC;

Number of calendar events by type

SELECT COUNT(DISTINCT calendar.calendar_id) AS count, event_name
FROM api_project_name.dataset_name.activity
WHERE calendar.calendar_id IS NOT NULL
GROUP BY 2 ORDER BY 1 DESC;

Drive

Number of Google Drive items shared, grouped by sharing method

SELECT COUNT(DISTINCT drive.doc_id) AS count, drive.visibility
FROM api_project_name.dataset_name.activity
WHERE drive.doc_id IS NOT NULL
GROUP BY 2 ORDER BY 1 DESC;

File ID, title, owner and type. Files that were shared externally within the time window.

SELECT TIMESTAMP_MICROS(time_usec) AS date, drive.doc_id, drive.doc_title,
drive.owner, drive.doc_type
FROM api_project_name.dataset_name.activity
WHERE drive.visibility = "shared_externally"
ORDER BY 1 DESC
LIMIT 100;

Sharing permission changes and their result. Gives you the ability to understand what permission changes yielded the change in file visibility.

SELECT TIMESTAMP_MICROS(time_usec) AS date, drive.doc_title,
drive.visibility_change,drive.old_visibility, drive.visibility,
FROM api_project_name.dataset_name.activity
WHERE record_type = "drive"
AND drive.old_visibility IS NOT NULL
AND drive.old_visibility != "unknown";

Event types broken down by file type. Useful for adoption report by file type.

SELECT drive.doc_type, event_type, count(*) 
FROM  api_project_name.dataset_name.activity
WHERE record_type = "DRIVE"
GROUP by 1,2 ORDER BY 3 desc;

Event type and name for each shared drive

SELECT drive.shared_drive_id, event_type, event_name, record_type,
count(distinct drive.doc_id) AS count
FROM api_project_name.dataset_name.activity
WHERE record_type = "drive"
AND drive.shared_drive_id IS NOT NULL
GROUP BY 1,2,3,4 ORDER BY 5 DESC;

Information on users outside of your domain

SELECT email, event_name, count(*) AS count
FROM api_project_name.dataset_name.activity
WHERE email != ""
AND email NOT LIKE "%mydomain.com%"
GROUP BY 1,2 ORDER BY 3 DESC;

What and when permission changes have been granted to external users

SELECT drive.target_user, event_name, count(*) AS count
FROM api_project_name.dataset_name.activity
WHERE drive.target_user IS NOT NULL
AND drive.target_user NOT LIKE "%mydomain.com%"
GROUP BY 1,2 ORDER BY 3 DESC;

Information on storage monitoring

Useful for building reports on users consuming more than X drive storage, with a set threshold (defined with the AND accounts.drive_used_quota_in_mb > 0 clause).

This query can be defined as a scheduled query or, for example, can be called periodically using the API.

SELECT date,
user_email,
accounts.drive_used_quota_in_mb,
FROM api_project_name.dataset_name.usage
WHERE accounts.drive_used_quota_in_mb IS NOT NULL
AND accounts.drive_used_quota_in_mb > 0
AND user_email != ""
AND date = CAST(DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AS STRING)
ORDER BY 3,1 DESC;

Notes:

  • This value can be modified to match the filter the customer is setting. For example, over 15GB: AND accounts.drive_used_quota_in_mb > 15000
  • The date comparison with CAST(DATE_SUB(CURRENT_DATE(), INTERVAL x DAY) AS STRING) makes it possible to have a date comparison with the available format from the date value.
  • This query is also applicable to Gmail, where we can find a similar value: accounts.gmail_used_quota_in_mb

Gmail

Best practices for Gmail with BigQuery

  • Query only for the data you need. These examples have a limit of 1,000 matches but you can set your own limit.
  • Set a time frame for your queries. One day is a typical time frame.

Subject match
Message summary view for up to 1,000 records matching a specified subject

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address,
     gmail.message_info.rfc2822_message_id
FROM your_dataset_id.activity
WHERE gmail.message_info.subject LIKE "%test%"
LIMIT 1000

Recipient match
Count number of distinct messages for a specified recipient

SELECT COUNT(DISTINCT gmail.message_info.rfc2822_message_id)
FROM your_dataset_id.activity d
WHERE
  EXISTS(
   SELECT 1 FROM d.gmail.message_info.destination WHERE destination.address = "recipient@example.com")

Disposition and recipient match

Message summary view for up to 1,000 records matching both:

  • A specified disposition (Modify, Reject, Quarantine)
  • A specified recipient

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE
     destination.address = "recipient@example.com" AND
     EXISTS(SELECT 1 FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence
          WHERE consequence.action = 17)
LIMIT 1000

Rule description triggered
Message summary view for up to 1,000 records, which triggered specified rule description

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE
    EXISTS(SELECT 1 FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence
       WHERE consequence.reason LIKE '%description%')
LIMIT 1000

Marked as spam
Message summary view for up to 1,000 records: 

  • Marked as spam
  • For a specified recipient
  • For all reasons

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.is_spam AND
          destination.address = "recipient@example.com"
LIMIT 1000

Encryption protocol—not encrypted
Message summary view by encryption protocol—not encrypted

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,       
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.connection_info.smtp_tls_state = 0
LIMIT 1000

Encryption protocol—TLS only
Message summary view by encryption protocol—TLS only

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.connection_info.smtp_tls_state = 1
LIMIT 1000

Message ID match
Message detail view for a given message ID (include “<>” around the message ID)

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.event_info.success,
     gmail.event_info.elapsed_time_usec,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     gmail.message_info.source.service as source_service,
     gmail.message_info.source.selector as source_selector,
     destination.address as destination,
     destination.service,
     destination.selector as destination_selector,
     gmail.message_info.rfc2822_message_id,
     gmail.message_info.payload_size,
     gmail.message_info.num_message_attachments,
     gmail.message_info.connection_info.smtp_tls_state,
     gmail.message_info.description
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.rfc2822_message_id = "<message id>"
LIMIT 1000

Disposition—Reject message
Reject message:

  • Which rule caused rejection?

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id,
     (SELECT ARRAY_AGG(consequence.reason)
     FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence)
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.rfc2822_message_id = "<message id>" AND
     EXISTS(SELECT 1 FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence
        WHERE consequence.action = 17)
LIMIT 1000

Disposition—Modify message
Modify message: 

  • Which rule caused the modification?
  • What modification subcategory (for example, headers or subject)?

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id,
     (SELECT ARRAY_AGG((consequence.action, consequence.reason))
     FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence)
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.rfc2822_message_id = "<message id>" AND
   EXISTS(SELECT 1 FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence
       WHERE consequence.action NOT IN (0, 17, 3))
LIMIT 1000

Quarantine message
Which rule quarantined a message?

SELECT TIMESTAMP_MICROS(gmail.event_info.timestamp_usec) as timestamp,
     gmail.message_info.subject,
     gmail.message_info.source.address as source,
     destination.address as destination,
     gmail.message_info.rfc2822_message_id,
     (SELECT ARRAY_AGG(consequence.reason)
     FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence)
FROM your_dataset_id.activity d, d.gmail.message_info.destination
WHERE gmail.message_info.rfc2822_message_id = "<message id>" AND
     EXISTS(SELECT 1 FROM d.gmail.message_info.triggered_rule_info ri, ri.consequence
        WHERE consequence.action = 3)
LIMIT 1000

Compound queries
Count all messages caught by a specific rule (named "rule description") in the last 30 days:

SELECT
  COUNT(gmail.message_info.rfc2822_message_id) AS message_cnt
FROM
  `your_dataset_id.activity`,
UNNEST (gmail.message_info.triggered_rule_info) AS triggered_rule
WHERE
  _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND triggered_rule.rule_name LIKE "rule description"

List all messages that were received without TLS encryption in the last day:

SELECT gmail.message_info.subject,
    gmail.message_info.rfc2822_message_id
FROM `your_dataset_id.activity`
WHERE
    _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND
    gmail.message_info.connection_info.smtp_tls_state = 0

List the top 10 domains my account exchanged mail within the last 30 days:

SELECT
  COUNT(DISTINCT gmail.message_info.rfc2822_message_id) as message_cnt,
  IF(gmail.message_info.is_policy_check_for_sender, 
       REGEXP_EXTRACT(gmail.message_info.source.address , "(@.*)"),
       REGEXP_EXTRACT(destination.address , "(@.*)")) AS domain
FROM `your_dataset_id.activity` d, d.gmail.message_info.destination
WHERE 
    _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY domain
ORDER BY message_cnt desc
LIMIT 10

Ratio of daily active users to 30-day active users in Gmail

Daily active users:

SELECT date,
gmail.num_1day_active_users
FROM api_project_name.dataset_name.usage
WHERE gmail.num_1day_active_users > 0
ORDER BY 1 DESC;

7-day active users:

SELECT date,
gmail.num_7day_active_users
FROM api_project_name.dataset_name.usage
WHERE gmail.num_7day_active_users > 0
ORDER BY 1 DESC;

30-day active users:

SELECT date,
gmail.num_30day_active_users
FROM api_project_name.dataset_name.usage
WHERE gmail.num_30day_active_users > 0
ORDER BY 1 DESC;

Most recent 100 Gmail log events with at least one classification label associated with the email message

SELECT
  resource_details[OFFSET(0)].id AS MESSAGE_ID,
  gmail.message_info.subject AS SUBJECT,
  gmail.event_info.mail_event_type AS MAIL_EVENT_TYPE,
  gmail.message_info.source.address AS SENDER,
  resource_details[OFFSET(0)].applied_labels AS LABELS
FROM workspace_audit_logs.activity
WHERE gmail.event_info.mail_event_type > 0 and ARRAY_LENGTH(resource_details) > 0
ORDER by time_usec desc
LIMIT 100;

All available log events for a specific email message

SELECT
  gmail.event_info,
  gmail.message_info,
  resource_details
FROM workspace_audit_logs.activity
WHERE gmail.message_info.rfc2822_message_id = "<XYZ>"
ORDER by time_usec desc;

Groups

Google Group membership changes and user behavior

SELECT TIMESTAMP_MICROS(time_usec) AS date,
  event_name,
  admin.group_email,
  event_type,
  email,
  record_type,
  admin.user_email,
  admin.new_value,
  admin.old_value,
  admin.setting_name
FROM project_name.dataset_name.activity
WHERE `admin`.group_email IS NOT NULL
AND CONCAT(TIMESTAMP_MICROS(time_usec)) LIKE "%YYYY-MM-DD%"
ORDER BY 1 DESC
LIMIT
  1000

If you wish to have a YYYY-MM-DD timestamp, the first SELECT statement element can be replaced with:
EXTRACT(DATE FROM TIMESTAMP_MICROS(time_usec)) AS date,

The dates can be filtered in the WHERE clause in either of the following ways:

SELECT TIMESTAMP_MICROS(time_usec) AS date,
  event_name,
  admin.group_email,
  event_type,
  email,
  record_type,
  admin.user_email,
  admin.new_value,
  admin.old_value,
  admin.setting_name
FROM project_name.dataset_name.activity
WHERE `admin`.group_email IS NOT NULL
AND EXTRACT(DATE FROM TIMESTAMP_MICROS(time_usec)) > "2020-06-30"
AND EXTRACT(DATE FROM TIMESTAMP_MICROS(time_usec)) < "2020-08-31"
ORDER BY 1 DESC
LIMIT
  1000

SELECT TIMESTAMP_MICROS(time_usec) AS date,
  event_name,
  admin.group_email,
  event_type,
  email,
  record_type,
  admin.user_email,
  admin.new_value,
  admin.old_value,
  admin.setting_name
FROM project_name.dataset_name.activity
WHERE `admin`.group_email IS NOT NULL
AND TIMESTAMP_MICROS(time_usec) > TIMESTAMP("2020-07-21")
AND TIMESTAMP_MICROS(time_usec) < TIMESTAMP("2020-07-23")
ORDER BY 1 DESC
LIMIT
  1000

Google Meet

Number of video calls and total call minutes by date

SELECT date, meet.num_calls, meet.total_call_minutes
FROM `api_project_name.dataset_name.usage`
WHERE meet.num_calls IS NOT NULL
ORDER BY date ASC

Daily active users

SELECT date, meet.num_1day_active_users
FROM `api_project_name.dataset_name.usage`
WHERE meet.num_1day_active_users IS NOT NULL
ORDER BY date DESC

30-day active users

SELECT date, meet.num_30day_active_users
FROM `api_project_name.dataset_name.usage`
WHERE meet.num_30day_active_users IS NOT NULL
ORDER BY date DESC

Rules

Triggered DLP rules by name, matched application, and actions

SELECT TIMESTAMP_MICROS(time_usec) AS date, rules.rule_name, rules.application,
rules.resource_title, rules.actions, rules.resource_owner_email,
rules.data_source, rules.matched_trigger
FROM api_project_name.dataset_name.activity
WHERE rules.rule_name IS NOT NULL
ORDER BY 1 DESC LIMIT 1000;

Tokens

Number of times a third-party app has been enabled to access Google Drive

SELECT token.client_id, scope, token.app_name, count(*) AS count
FROM api_project_name.dataset_name.activity
LEFT JOIN UNNEST(token.scope) AS scope
WHERE scope LIKE "%drive%"
GROUP BY 1,2,3 ORDER BY 4 DESC;

Admin console sign-in attempts

Detailed information on failed sign-ins to the Google Admin console

SELECT TIMESTAMP_MICROS(time_usec) AS date, email, ip_address,
event_name, login.login_type, login.login_failure_type
FROM api_project_name.dataset_name.activity
WHERE login.login_type IS NOT NULL
AND login.login_failure_type IS NOT NULL
AND event_type = "login"
ORDER BY date DESC;

Schema for usage tables

The schema can change. You can see an updated and complete list of parameters and fields in the Reports API documentation

Filter by date

You can filter by date when querying either the activity or the usage tables. Both have distinct formats when presenting the date:

  • The activity table stores the timestamps in Unix microseconds. This is an integer value (a number) that can be converted to a date with the TIMESTAMP_MICROS() function.
  • The usage table displays its date values with a date format, so this conversion isn’t necessary.

For either table, you can choose to filter by a specific date (or date range) using one of the following methods.

The activity table

To filter by a specific date with the Unix Micros (activity table) structure, you can define the WHERE clause and the TIMESTAMP() function to perform a simple comparison with the greater than (>) and lesser than (<) operators:

SELECT TIMESTAMP_MICROS(time_usec) as date, record_type
FROM api_project_name.dataset_name.activity
WHERE TIMESTAMP_MICROS(time_usec) > TIMESTAMP("2020-07-01")
AND TIMESTAMP_MICROS(time_usec) < TIMESTAMP("2020-07-07")
ORDER BY 1 DESC LIMIT 1000

The concept here is placing limits in the input time_usec value by comparing its return value from the function TIMESTAMP_MICROS() against the return value of the TIMESTAMP() function with a date added as a string-type parameter. This follows the standards in Timestamp functions in Standard SQL, and uses simple comparison operators (>) and (<), along with the AND extension of the WHERE clause to close a time window in particular.

The usage table

SELECT date, meet.num_calls,
FROM api_project_name.dataset_name.usage
WHERE meet.num_calls IS NOT NULL
AND TIMESTAMP(date) > TIMESTAMP("2020-07-01")
AND TIMESTAMP(date) < TIMESTAMP("2020-07-07")
ORDER BY date DESC;

We can pass the string-type date value present in the table into the TIMESTAMP() function and use the comparison operators (>) and (<) the same way as in the first example.

 

Filter by domain aliases and subdomains: Exclude and include

To exclude or include certain domains from your query results,  apply a filter for the email address on the WHERE clause, using wildcards (%) to filter the domains.

How you use the AND or the OR statement depends on whether you’re filtering out (excluding) or only including certain results.

Exclude certain domains from results

WHERE email NOT LIKE ("%@sub.%")
AND email NOT LIKE ("%@test.%")

Only include certain domains in results

WHERE email LIKE ("%@sub.%")
OR email LIKE ("%@test.%")

Rules audit logs

Use this query to track users' attempts to share sensitive data

SELECT TIMESTAMP_MICROS(time_usec) AS Date,
rules.resource_owner_email AS User,
rules.rule_name AS ruleName,
rules.rule_type AS ruleType,
rules.rule_resource_name AS ruleResourceName,
rules.resource_id AS resourceId,
rules.resource_title AS resourceTitle,
rules.resource_type AS resourceType,
rules.resource_owner_email AS resourceOwner,
CAST(recipients AS STRING) AS Recipients,
rules.data_source AS dataSource,
rules.actor_ip_address AS actorIpAddress,
rules.severity AS severity,
rules.scan_type AS scanType,
rules.matched_trigger AS matchedTriggers,
detect.display_name AS matchedDetectorsName,
detect.detector_id AS matchedDetectorsId,
detect.detector_type AS matchedDetectorsType,
triggers.action_type AS triggeredActions,
suppressors.action_type AS suppressedActions,
FROM api_project_name.dataset_name.activity
LEFT JOIN UNNEST(rules.resource_recipients) as recipients
LEFT JOIN UNNEST(rules.matched_detectors) as detect
LEFT JOIN UNNEST(rules.triggered_actions) as triggers
LEFT JOIN UNNEST(rules.suppressed_actions) as suppressors
WHERE rules.rule_name IS NOT NULL
AND triggers.action_type != "ALERT"
ORDER BY 1 DESC
LIMIT 1000;

Related topic

Was this helpful?

How can we improve it?
Search
Clear search
Close search
Google apps
Main menu
11591562978533937750