Search
Clear search
Close search
Google apps
Main menu
true

How Vault exports work

After Vault has located the messages or files you need, you're ready to export them for further analysis. The export functionality of Google Vault is designed to:

  • Provide you with a comprehensive copy of all the data that matches your search criteria.
  • Provide you with the metadata you need to link the exported data to individual users in your domain.
  • Provide you with the corroborating information required to prove that the exported data matches the data stored on Google’s servers.

Mail, chat, and Groups exports

When you export mail or chat messages from Vault, you can download the following items:

  • A compressed PST or mbox file—Contains details and contents of the exported messages. After extracting the .zip file, you can open files in:

    • PST—Microsoft Outlook. 

    • mbox—Mozilla Thunderbird or a text editor. 

    • Some litigation support systems. Some of these systems can open PST files, or they include email conversion tools for mbox files.

    Vault exports up to 10 GB of messages in a single file. If you export more than 10 GB of mail and chat data, Vault creates multiple files.

  • An XML fileContains message metadata as it exists on Google servers. Open this file in a text editor and use it to connect message metadata with the message contents from the PST or mbox file.
  • A CSV fileContains addresses of message owners included in the export, along with the number of messages owned by each user.
  • If there are errors, an error report is also included.
  • A checksum fileContains message digest 5 (MD5) hash values for the preceding files.
Review messages in an email client

You can review mail and chat messages in Microsoft Outlook (PST) or Mozilla Thunderbird(mbox). This method is useful for viewing HTML messages and attachments that a text editor can't display.

PST and mbox files contain all of the details for the exported mail and chat messages. The Vault XML file reflects the message metadata as recorded by Google. Together these files provide a link between the messages stored on Google servers and the data you’ve exported from Vault.

  1. Import and review messages in your email application. 
  2. For messages that are important to the matter, view the headers:
    • Outlook—Varies depending on the version you're using. See the Microsoft documentation for viewing message headers.  
    • Thunderbird—Click View > Headers > All to display the headers for each message:
  3. Each header includes a Message ID. Compare Message IDs with metadata in the XML file to correlate messages with the data stored on Google's servers.
Review mbox files in a text editor

An mbox file is a standard format for storing messages. It contains all of the details for the exported messages, including message text and any attachments. The Vault XML file reflects the message metadata as recorded by Google. Together these files provide a link between the messages stored on Google servers and the data you’ve exported from Vault.

After you export, you use the message parameters from the Vault XML file to identify corresponding messages in the mbox file. To get started, open the XML file in a text editor and look for the FileName parameter; for example:

<ExternalFile FileName='1463030154355209614-d7f2c19a-73f3-40e4-a17a-130b90c37aac.mbox'

This parameter includes a unique identifier and it corresponds to a similar entry in the mbox file called the From_ line. The From_ line contains the same identifier, along with the date and time (in UTC) that the message was received by Google; for example:

From 1463030154355209614-d7f2c19a-73f3-40e4-a17a-130b90c37aac.mbox@xxx Wed Mar 19 06:38:02 2014

The From_ line is the first entry for each message included in the mbox file. When you get to a new From_ line, you're looking at a different message.

Email and chat parameters in the Vault XML file

The Vault XML file included with your export captures the following metadata:

Included with each mail message

  • #From
  • #To
  • #CC
  • #BCC
  • #Subject
  • #DateSent
  • #DateReceived

Included with each chat message

  • #SubjectAtStart
  • #SubjectAtEnd
  • #DateFirstMessageSent
  • #DateLastMessageSent
  • #DateFirstMessageReceived
  • #DateLastMessageReceived

Included with both mail and chat messages

  • Labels—Shows labels applied by Gmail, such as ^INBOX, ^TRASH, and ^DELETED. Also shows any labels applied to the message by the user.
  • FileName—Shows the message identifier. Correlate this with the message ID shown in your exported PST or mbox file. 

Query parameters for the entire export

  • UserQuery—Shows the query submitted by the Vault user that retrieved the messages included in this export.
  • TimeZone—Shows the time zone used for date-based searches.
  • Custodians—Shows the email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.

Drive exports

When you use Vault to export files from Drive, you can download the following files:

  • A compressed file—Contains all of the files found by your search. Vault exports up to 10 GB of data in a single compressed file. If you export more than 10 GB of data, Vault creates multiple files. 
  • An XML file—Contains metadata, including:
    • Document IDs
    • User email addresses
    • Created and modified dates for each file
    • Document types and titles
  • A CSV file—Maps document IDs to user accounts. Use this information to determine which users have access to the exported files.
  • If there are errors, an error report is also included.
  • A checksum file—Contains MD5 values for the preceding files.

Exported files are converted as follows:

Drive file type Exported format
Google Docs .docx
Google Sheets .xlsx
Google Forms .zip (.html and .csv)
Google Slides .pptx
Google Drawings .pdf
Non-Google files No format change

 

File parameters in the Vault XML file

The XML file included with your export captures the following metadata:

Included with each file

  • #Author—Shows the email address of the person who owns the file in Drive. For a Team Drive file, it shows the Team Drive name.
  • Collaborators—Shows the accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export.
  • Viewers—Shows the accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export.
  • Others—Shows the accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export.
  • #DateCreated—Shows the date the file was created. For files created outside of G Suite, this date is recorded by the creator’s computer. It doesn't change when the file is uploaded to Drive.
  • #DateModified—Shows the date the file was last modified. For files modified outside of G Suite, this date is recorded by the modifier’s computer. It does not change when the file is uploaded to Drive.
  • #Title—Shows the file name as assigned by the user. Because some operating systems can't expand zip files with extremely long file names, Vault truncates the file name at 128 characters during export. The value shown by the #Title tag isn't truncated.
  • DocumentType—Indicates the file type for Google files. Possible values are DOCUMENT, SPREADSHEET, PRESENTATION, FORM, and DRAWING.
  • TeamDriveID—Shows the identifier of the Team Drive that contains the file (if applicable).

Query parameters for the entire export

  • UserQuery—Shows the query submitted by the Vault user that retrieved the files included in this export.
  • TimeZone—Shows the time zone used for date-based searches.
  • Custodians—Shows the email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.
Exporting access-level information for users with indirect access to files

When you export files from Drive, Vault may include metadata for users in your domain who have indirect access to, and have opened a file that matches your search criteria.

A user can have indirect access when a file or folder containing a file is:

  • Shared with a group the user belongs to
  • Shared with the domain 
  • Shared publicly

During export, you have the option to choose the kind of information you want included in the metadata output:

  • In the export dialog, check the box to have Vault determine the permission level for users in your domain who have indirect access to files. Each of these users is included in one of these categories when you open the XML file:

    • Collaborators—Users who have indirect permission to edit or add comments to a file.

    • Viewers—Users who have indirect permission to view a file.

    • Others—In some circumstances, Vault can't determine the type of access a user has at the time of export. This can happen, for example, if a file was shared with a group, and the user was later removed from the group.

    Vault needs additional time to determine what permissions these users have, so this option may increase the time it takes to prepare your files for download.

  • In the export dialog, leave the box unchecked (default) to exclude access-level information for users in your domain with indirect access to files. These users are listed as Others in the XML file. 

Error reports

Vault is occasionally unable to export an email message from Gmail or a file from Drive, and an error report is generated. The report, in the form of a .csv file, lists the items with export errors along with additional details and metadata. There are two types of errors:

  • Transient errors—A backend server was unable to retrieve the email or file. The item should be available for export when you search for it later.
  • Non-transient errors—Any error that's not explicitly labeled as transient is the result of an issue that cannot be corrected. Typically this occurs when a message attachment or file has been deleted, is not supported for export, or cannot be converted to the requested format.

To determine if the problem is transient or non-transient, open the .csv file with Google Sheets (or a similar spreadsheet application) and find the Error Description column.

If the error report includes email messages with transient errors, use each message’s RFC 822 identifier to find those specific messages when you search again. The format of the search operator is rfc822msgid:identifier.

Ready to begin?

Export search results

Was this article helpful?
How can we improve it?