Review Vault export files

After you use Google Vault to search for the data you want, you can export it for further analysis (learn how). An export contains the following information:

  • A comprehensive copy of the data that matched your search criteria.
  • The metadata you need to link the exported data to individual users in your organization.
  • The corroborating information required to prove that the exported data matches the data stored on Google’s servers.

Learn how to work with exports for the following services:

Gmail, Chat, and Groups exports

Export contents
Information File name Description
Message contents export_name-N.zip

Zip files of PST or mbox files. These files contain the contents and details of the exported messages. For Google Chat messages, these details include when a message was edited by the sender or when a message was deleted.

You might have multiple files if the export includes messages from more than one account or if the file size exceeds 1 GB for PST files or 10 GB for mbox. The file names end with an increment to distinguish the files.

Reviewing messages

After you extract the zip file, the way you can review and process the messages depends on the type of file:

  • PST—Microsoft Outlook or some litigation support systems.

  • mbox—Mozilla Thunderbird, a text editor, or some litigation support systems that include email conversion tools for mbox files.

Note: Google does not provide technical support for configuring third-party products. GOOGLE ACCEPTS NO RESPONSIBILITY FOR THIRD-PARTY PRODUCTS. Consult the product’s website for the latest configuration and support information.

Group membership information export_name-group-membership.csv

A CSV file that lists the following information for each group member:

  • the member's email addresses
  • the email address of the group
  • when the user became a member of the group
  • the member's role: MEMBER for a group member, MANAGER for a group manager, or OWNER for a group owner
  • the type of account: USER for an individual user account or GROUP for a group email address
Message metadata export_name-metadata.xml

An XML file that contains message metadata as it exists on Google servers. Open this file in a text editor and use it to connect message metadata with the message contents from the mbox file.

Note: PST file contents can’t be correlated with the XML file metadata.

Accounts and message count export_name-results-count.csv A CSV file that lists the accounts of message owners included in the export and the number of messages owned by each account.
Error reports

error.csv

export_name-account-exceptions.csv (Gmail exports)

export_name-failed-group-membership-lookups.csv (Groups exports)

Error reports are included only if the export encounters errors.

  • error.csv—Lists errors retrieving messages. Learn more
  • export_name-account-exceptions.csv—Lists Gmail accounts that were searched but not all matching messages were exported
  • export_name-failed-group-membership-lookups.csv—Lists group email addresses that were searched but not all members were returned
File checksums File checksums The file lists the message digest 5 (MD5) hash values for all files in the export.
Review messages in an email client

You can review Gmail and Chat messages in Microsoft Outlook (PST) or Mozilla Thunderbird(mbox). This method is useful for viewing HTML messages and attachments that a text editor can't display.

PST and mbox files contain the details of the exported messages. The metadata file reflects the message metadata as recorded by Google. You can correlate mbox content and message metadata to provide a link between the messages stored on Google servers and the data you’ve exported from Vault.

Note: The labels used in Gmail to classify messages aren't converted into mailbox folders. When you open a PST or mbox file in an email client, all messages appear in a single folder.

To review exported messages in an email client:

  1. Import and review messages in your email application.
  2. For messages that are important to the matter, view the headers:
    • Outlook—See the Microsoft documentation about how to view message headersfor your version.
    • Thunderbird—Click View > Headers > All to display the headers for each message:
  3. In Thunderbird, each header includes a Message ID. To correlate messages with the data stored on Google's servers, compare Message IDs with metadata file.
Review mbox files in a text editor

An mbox file is a standard format for storing messages. It contains all the details for the exported messages, including message text and any attachments. The metadata file reflects the message metadata as recorded by Google. Together these files provide a link between the messages stored on Google servers and the data you’ve exported from Vault.

After you export, you use the message parameters from the metadata file to identify corresponding messages in the mbox file. To get started, open the metadata file in a text editor and look for the FileName parameter; for example:

<ExternalFile FileName='1463030154355209614-d7f2c19a-73f3-40e4-a17a-130b90c37aac.mbox'

This parameter includes a unique identifier and it corresponds to a similar entry in the mbox file called the From_ line. The From_ line contains the same identifier, along with the date and time (in UTC) that the message was received by Google; for example:

From 1463030154355209614-d7f2c19a-73f3-40e4-a17a-130b90c37aac.mbox@xxx Wed Mar 19 06:38:02 2014

The From_ line is the first entry for each message included in the mbox file. When you get to a new From_ line, you're looking at a different message.

Message parameters in the metadata file

The metadata file contains the following information:

Included for Gmail and Groups messages

  • #From—The email account of the sender
  • #To—The email accounts of all recipients
  • #CC—The email accounts of all Cc'd recipients
  • #BCC—The email accounts of all Bcc'd recipients
  • #Subject—The message subject
  • #DateSent—The timestamp for when the message was sent
  • #DateReceived—The timestamp for when the message was received

Included for classic Hangouts and Chat messages

  • #SubjectAtStart—(classic Hangouts only) The subject of the conversation when the first message was sent
  • #SubjectAtEnd—(classic Hangouts only) The subject of the conversation when the last message was sent
  • #DateFirstMessageSent—The timestamp for when the first message in a conversation was sent
  • #DateLastMessageSent—The timestamp for when the last message in a conversation was sent
  • #DateFirstMessageReceived—The timestamp for when the first message in a conversation was received
  • #DateLastMessageReceived—The timestamp for when the last message in a conversation was received

Included for all messages (Gmail, Groups, and Chat)

  • Labels—Any labels applied by Gmail or Chat, such as ^INBOX, ^TRASH, and ^DELETED. Also shows any labels applied to the message by the user.
  • FileName—The message identifier. Correlate this value with the message ID shown in your exported PST or mbox file.
  • FileSize—The size of the message in bytes.
  • Hash—The MD5 hash of the message.

Included for Chat messages (not classic Hangouts)

  • RoomID–The room or DM identifier that the message belongs to.
  • Participants–The email addresses of all users who participated in the conversation.
  • roomName–The name of the room or a comma-separated list of accounts that participated in a DM.
  • conversationType–The message location: a room or a DM.

Query parameters for the entire export

  • UserQuery—The query submitted by the Vault user that retrieved the messages included in this export.
  • TimeZone—The time zone used for date-based searches.
  • Custodians—The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.

Drive exports

Export contents
Information File name Description
Files export_name_N.zip

Contains all the files found by your search. Vault exports up to 10 GB of data in a single compressed file. If you export more than 10 GB of data, Vault creates multiple files.

Exported files are named with the original name of the file followed by an underscore ("_") and the Drive file ID.

Exported Google files are converted as follows:

  • Google Docs to DOCX
  • Google Sheets to XLSX
  • Google Forms to ZIP (HTML and CSV)
  • Google Slides to PPTX
  • Google Drawings to PDF
File metadata export_name-metadata.xml

Contains metadata, including:

  • Document IDs (Note: These IDs are not the Drive file IDs. They correspond to values in the CSV file.)
  • User email addresses
  • Created and modified dates for each file
  • Document types and titles

Learn more

Accounts and doc IDs export_name-custodian-docid.csv Lists user accounts with their associated document IDs. Use this information to determine which users have access to the exported files.
Error reports

error.csv

export_name-incomplete-accounts.csv

Error reports are included only if the export encounters errors.

  • error.csv—Lists errors retrieving files and the file metadata. Learn more
  • export_name-incomplete-accounts.csv—Lists accounts that were searched but not all matching files were exported
File checksums File checksums The file lists the message digest 5 (MD5) hash values for all files in the export.
File parameters in the metadata file

The metadata file included with your export captures the following metadata:

Included with each file

  • #Author—The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name.
  • Collaborators—The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export.
  • Viewers—The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export.
  • Others—The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export.
  • #DateCreated—The date a Google file was created in Drive. For non-Google files, this indicates when the file was uploaded to Drive.
  • #DateModified—The date the file was last modified.
  • #Title—The filename as assigned by the user. Because some operating systems can't expand zip files with extremely long filenames, Vault truncates the filename at 128 characters during export. The value shown by the #Title tag isn't truncated.
  • DocumentType—The file type for Google files. Possible values are DOCUMENT, SPREADSHEET, PRESENTATION, FORM, and DRAWING.
  • SharedDriveID—The identifier of the shared drive that contains the file (if applicable).
  • SourceHash–A unique hash value for each version of a file. Can be used to deduplicate file exports and verify the exported file is an exact copy of the source file. Supported by Google Docs, Sheets, and Slides files only.

Query parameters for the entire export

  • UserQuery—The query submitted by the Vault user that retrieved the files included in this export.
  • TimeZone—The time zone used for date-based searches.
  • Custodians—The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.
Exporting access-level information for users with indirect access to files

When you export files from Drive, the metadata file may include information about users in your organization who have indirect access to, and have opened, a file that matches your search criteria.

A user can have indirect access when a file or folder containing a file is:

  • Shared with a group the user belongs to
  • Shared with the domain
  • Shared publicly

During export, you can choose the information you want to include in the metadata output:

  • In the export dialog, check the box to have Vault determine the permission level for users in your domain who have indirect access to files. Each of these users is included in one of these categories when you open the metadata file:

    • Collaborators—Users who have indirect permission to edit or add comments to a file.
    • Viewers—Users who have indirect permission to view a file.
    • Others—In some circumstances, Vault can't determine the type of access a user has at the time of export. This can happen, for example, if a file was shared with a group, and the user was later removed from the group.

    Vault takes time to determine what permissions these users have, so this option might increase the time it takes to prepare your files for download.

  • In the export dialog, leave the box unchecked (default) to exclude access-level information for users in your domain with indirect access to files. These users are listed as Others in the metadata file.

Google Voice exports

Export contents
Information File name Description
Voice data files export_name-N.zip A zip file is generated for each account and contains PST or mbox files of text conversations, call logs, voicemail MP3 audio files, and voicemail transcriptions.
File metadata export_name-metadata.xml An XML file that contains metadata as it exists on Google servers.
File checksums File checksums A checksum file with message digest 5 (MD5) hash values for all files included in the export.
Error report

error.csv

Error reports are included only if the export encounters errors. Learn more

Note: Unlike other services, Voice exports don’t include a count file.

Voice data parameters in the metadata file

The metadata file contains the following information:

Information about each file

  • DocID—A unique identifier for the file.
  • #Author—The email address of the account that owns the file in Drive.
  • #DateFirstMessageSent—For text conversations, the date the first message was sent. Note: this and the following 3 fields are identical in entries for voicemails and call logs.
  • #DateLastMessageSent—For text conversations, the date the last message was sent.
  • #DateFirstMessageReceived—For text conversations, the date the first message was received.
  • #DateLastMessageReceived—For text conversations, the date the last message was received.
  • ConversationType—The data type:
    • TEXT_MESSAGE—A text message.
    • VOICEMAIL—A voicemail.
    • INCOMING_CALL—A call log of an incoming call.
    • OUTGOING_CALL—A call log of an outgoing call.
    • MISSED_CALL—A call log of an unanswered incoming call.
  • ParticipantPhoneNumbers—The phone numbers of the participants.
  • OwnerPhoneNumbers—The value might include multiple phone numbers when the user's number changed.
  • Labels—Any labels on the conversation. For example, deleted conversations have the DELETED label.
  • ExternalFile FileName—The file identifier, which correlates to the Subject in the PST or mbox file.

Query parameters for the entire export

  • UserQuery—The query submitted by the Vault admin.
  • TimeZone—The time zone of the query
  • Custodians—The email addresses of the accounts that were searched.

Error reports (error.csv)

When Vault is unable to export data from a service, Vault generates an error report (error.csv). The report lists the items with export errors along with more details and metadata.

Vault reports two types of errors:

  • Transient errors—A backend server was unable to retrieve the email message or file. The item should be available for export when you search for it later.
  • Non-transient errors—Any error that's not explicitly labeled as transient is the result of an issue that cannot be corrected. Typically these errors occur when a message attachment or file was deleted, isn't supported for export, or can't be converted to the requested format.

To determine if the problem is transient or non-transient, open the CSV file with Google Sheets or another spreadsheet app and find the Error Description column (Note: error descriptions aren't available for Voice exports). If the error is transient, learn how to recover from transient errors.

Error report contents

Error report contents for Gmail and Groups

The error report contains the following fields for each message. Fields are blank if the data isn't available or applicable for a message.

Field

Description
Document ID A unique identifier for the file
Document type The document type. Value is mail.
File type

The file type. Value is mail

Attachments count The number of attachments to the message
Attachment names The file names of the attachments
Subject The message subject
Size The message size
From The sender's email account
To The email accounts of all recipients
Cc The email accounts of all Cc'd recipients
Sent time The timestamp for when the message was sent
Source account The account that was included in the search query
Error description A description of the error
RFC 822 Message-ID

A unique identifier for a message that's added by mail servers

Example: rfc822msgid:AANLkTilQ5MWSp7-iE6SKepvOl-
Spjupgr1NZTiLGu16Z@mail.solarmora.com

 

Error report contents for Chat

The error report contains the following fields for each message. Fields are blank if the data isn't available or applicable for a message.

Field

Description
Document ID A unique identifier for the file
Filename The document type. Value is mail.
Conversation Type

The type of message. Value is mail.

Room Name The name or the room
Error description A description of the error

 

Error report contents for Drive files
The error report contains the following fields for each file. Fields are blank if the data isn't available or applicable for a file.
Field Description
Document ID A unique identifier for the file
Document type Indicates the file type for Google files. Possible values are DOCUMENT, SPREADSHEET, PRESENTATION, FORM, and DRAWING.
File type The file format, such as PDF or XLSX
Title The filename as assigned by the user
Size The size of the file
Creator The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name.
Collaborators The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export.
Viewers The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export.
Others The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export.
Creation time The date a Google file was created in Drive. For non-Google files, this indicates when the file was uploaded to Drive.
Last modified time The date the file was last modified
Error description A description of the error
Drive Document ID A unique identifier for a file in Drive

 

Error report contents for Voice data

The error report lists accounts that were searched but not all matching files were exported.

Field Description
Account The email address of the account that some data wasn’t exported for.
Failed Conversation Count The number of conversations that weren’t exported. If the number is unknown, the value is Unknown Failure Count.

Recover from transient errors

You can use message and file details to search for and export the data that wasn't exported due to transient errors:

  • If the error report includes messages with transient errors, use each message’s RFC 822 identifier to find those specific messages when you search again. The format of the search term is rfc822msgid:identifier.
  • If the error report includes Drive files with transient errors, use each file's title to find those specific files when you search again. The format of the search term is title:"title-of-file".
Was this helpful?
How can we improve it?