After you use Google Vault to search for the data you want, you can export it for further analysis (learn how). An export contains the following information:
- A comprehensive copy of the data that matched your search criteria.
- The metadata you need to link the exported data to individual users in your organization.
- The corroborating information required to prove that the exported data matches the data stored on Google’s servers.
Gmail, Chat, and Groups exports
Export contentsInformation | File name | Description |
---|---|---|
Message contents | export_name-N.zip |
Zip files of PST or mbox files. These files contain the contents and details of the exported messages. For Google Chat messages, these details include when the sender edited or deleted a message. Learn about options for reviewing PST and mbox files. You might get many zip files in the following scenarios:
The file names end with an increment to distinguish the files. |
Google Groups membership information | export_name-group-membership.csv |
A CSV file that lists the following information for each group member:
|
Message metadata | export_name-metadata.xml |
An XML file that contains message metadata as it exists on Google servers. Open this file in a text editor and use it to connect message metadata with the message contents from the mbox file. Note: PST file contents can’t be correlated with the XML file metadata. |
Accounts and message count | export_name-results-count.csv | A CSV file that lists the accounts of message owners included in the export and the number of messages owned by each account. |
Error reports |
error.csv export_name-account-exceptions.csv (Gmail exports) export_name-failed-group-membership-lookups.csv (Groups exports) |
Error reports are included only if the export encounters errors.
|
File checksums | File checksums | The file lists the message digest 5 (MD5) hash values for all files in the export. |
The metadata file contains the following information:
Included for Gmail and Groups messages
#From
—The email account of the sender#To
—The email accounts of all recipients#CC
—The email accounts of all Cc'd recipients#BCC
—The email accounts of all Bcc'd recipients#Subject
—The message subject#DateSent
—The timestamp for when the message was sent#DateReceived
—The timestamp for when the message was received
Included for classic Hangouts and Chat messages
#SubjectAtStart
—(classic Hangouts only) The subject of the conversation when the first message was sent#SubjectAtEnd
—(classic Hangouts only) The subject of the conversation when the last message was sent#DateFirstMessageSent
—The timestamp for when the first message in a conversation was sent#DateLastMessageSent
—The timestamp for when the last message in a conversation was sent#DateFirstMessageReceived
—The timestamp for when the first message in a conversation was received#DateLastMessageReceived
—The timestamp for when the last message in a conversation was received
Included for all messages (Gmail, Groups, and Chat)
Labels
—Any labels applied by Gmail or Chat, such as ^INBOX, ^TRASH, and ^DELETED. Also shows any labels applied to the message by the user.FileName
—A message identifier. Use this value to correlate metadata with the corresponding message in an email client or a text editor.FileSize
—The size of the message in bytes.Hash
—The MD5 hash of the message.
Included for Chat messages (not classic Hangouts)
RoomID
–The room, group chat, or DM identifier that the message belongs to.Participants
–The email addresses of all users who participated in the conversation.RoomName
–The value depends on the type of message:- For rooms, the name of the room.
- For group chats created after early December 2020,
Group chat
. - For group chats created before early December 2020 and DMs, a comma-separated list of accounts that participated.
ConversationType
–The message type:- For a group chat created after early December 2020 or a room, the value is
Room
. - For a group chat created before December 2020, the value is
Group Direct Message
. - For a DM, the value is
1:1 Direct Message
.
- For a group chat created after early December 2020 or a room, the value is
Query parameters for the entire export
UserQuery
—The query submitted by the Vault user that retrieved the messages included in this export.TimeZone
—The time zone used for date-based searches.Custodians
—The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.
Drive exports
Export contentsInformation | File name | Description |
---|---|---|
Files | export_name_N.zip |
Contains all the files found by your search. Vault exports up to 10 GB of data in a single compressed file. If you export more than 10 GB of data, Vault creates multiple files. Exported files are named with the original name of the file followed by an underscore ("_") and the Drive file ID. Exported Google files are converted as follows:
|
File metadata | export_name-metadata.xml |
Contains metadata, including:
|
Accounts and doc IDs | export_name-custodian-docid.csv | Lists user accounts with their associated document IDs. Use this information to determine which users have access to the exported files. |
Error reports |
error.csv export_name-incomplete-accounts.csv |
Error reports are included only if the export encounters errors.
|
File checksums | File checksums | The file lists the message digest 5 (MD5) hash values for all files in the export. |
The metadata file included with your export captures the following metadata:
Included with each file
#Author
—The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name.Collaborators
—The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export.Viewers
—The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export.Others
—The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export.#DateCreated
—The date a Google file was created in Drive. For non-Google files, the date the file was uploaded to Drive.#DateModified
—The date the file was last modified.#Title
—The filename as assigned by the user. Because some operating systems can't expand zip files with extremely long filenames, Vault truncates the filename at 128 characters during export. The value shown by the#Title
tag isn't truncated.DocumentType
—The file type for Google files. Possible values are DOCUMENT, SPREADSHEET, PRESENTATION, FORM, and DRAWING.SharedDriveID
—The identifier of the shared drive that contains the file (if applicable).SourceHash
–A unique hash value for each version of a file. Can be used to deduplicate file exports and verify the exported file is an exact copy of the source file. Supported by Google Docs, Sheets, and Slides files only.
Query parameters for the entire export
UserQuery
—The query submitted by the Vault user that retrieved the files included in this export.TimeZone
—The time zone used for date-based searches.Custodians
—The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.
When you export files from Drive, the metadata file may include information about users in your organization who have indirect access to, and have opened, a file that matches your search criteria.
A user can have indirect access when a file or folder containing a file is:
- Shared with a group the user belongs to
- Shared with the domain
- Shared publicly
During export, you can choose the information you want to include in the metadata output:
-
In the export dialog, check the box to have Vault determine the permission level for users in your domain who have indirect access to files. Each of these users is included in one of these categories when you open the metadata file:
Collaborators
—Users who have indirect permission to edit or add comments to a file.Viewers
—Users who have indirect permission to view a file.Others—
In some circumstances, Vault can't determine the type of access a user has at the time of export. This can happen, for example, if a file was shared with a group, and the user was later removed from the group.
Vault takes time to determine what permissions these users have, so this option might increase the time it takes to prepare your files for download.
-
In the export dialog, leave the box unchecked (default) to exclude access-level information for users in your domain with indirect access to files. These users are listed as
Others
in the metadata file.
Google Voice exports
Export contentsInformation | File name | Description |
---|---|---|
Voice data files | export_name-N.zip | A zip file is generated for each account and contains PST or mbox files of text conversations, call logs, voicemail MP3 audio files, and voicemail transcriptions. |
File metadata | export_name-metadata.xml | An XML file that contains metadata as it exists on Google servers. |
File checksums | File checksums | A checksum file with message digest 5 (MD5) hash values for all files included in the export. |
Error report |
error.csv |
Error reports are included only if the export encounters errors. Learn more |
Note: Unlike other services, Voice exports don’t include a count file.
The metadata file contains the following information:
Information about each file
DocID
—A unique identifier for the file.#Author
—The email address of the account that owns the file in Drive.#DateFirstMessageSent
—For text conversations, the date the first message was sent. Note: this and the following 3 fields are identical in entries for voicemails and call logs.#DateLastMessageSent
—For text conversations, the date the last message was sent.#DateFirstMessageReceived
—For text conversations, the date the first message was received.#DateLastMessageReceived
—For text conversations, the date the last message was received.ConversationType
—The data type:TEXT_MESSAGE
—A text message.VOICEMAIL
—A voicemail.INCOMING_CALL
—A call log of an incoming call.OUTGOING_CALL
—A call log of an outgoing call.MISSED_CALL
—A call log of an unanswered incoming call.
ParticipantPhoneNumbers
—The phone numbers of the participants.OwnerPhoneNumbers
—The value might include multiple phone numbers when the user's number changed.Labels
—Any labels on the conversation. For example, deleted conversations have the DELETED label.ExternalFile FileName
—The file identifier, which correlates to the Subject in the PST or mbox file.
Query parameters for the entire export
UserQuery
—The query submitted by the Vault admin.TimeZone
—The time zone of the queryCustodians
—The email addresses of the accounts that were searched.
Review exported messages
After you extract the zip file for a Gmail or Chat export, the way you can review and process the messages depends on the type of file:
-
PST—Microsoft Outlook or some litigation support systems.
-
mbox—Mozilla Thunderbird, a text editor, or some litigation support systems that include email conversion tools for mbox files.
Note: Google does not provide technical support for third-party products. GOOGLE ACCEPTS NO RESPONSIBILITY FOR THIRD-PARTY PRODUCTS. Consult the product’s website for the latest configuration and support information.
Review messages in an email clientYou can review Gmail and Chat messages in Microsoft Outlook (PST) or Mozilla Thunderbird(mbox). This method is useful for viewing HTML messages and attachments that a text editor can't display.
PST and mbox files contain the details of the exported messages. The metadata file reflects the message metadata as recorded by Google. You can correlate mbox content and message metadata to provide a link between the messages stored on Google servers and the data you export from Vault.
To review exported messages in an email client:
- Import and review messages in your email application.
- For messages that are important to the matter, view the headers:
- Outlook—Review the Microsoft documentation about how to view message headers for your version.
- Thunderbird—Click View > Headers > All to display the headers for each message:
- In Thunderbird, each header includes a Message ID. To correlate messages with the data stored on Google's servers, compare Message IDs with metadata file.
An mbox file is a standard format for storing messages. It contains all the details for the exported messages, including message text and any attachments. The metadata file reflects the message metadata as recorded by Google. Together these files provide a link between the messages stored on Google servers and the data you’ve exported from Vault.
After you export, you use the message parameters from the metadata file to identify corresponding messages in the mbox file. To get started, open the metadata file in a text editor and find for the FileName
parameter; for example:
<ExternalFile FileName='1463030154355209614-d7f2c19a-73f3-40e4-a17a-130b90c37aac.mbox'
This parameter includes a unique identifier and it corresponds to a similar entry in the mbox file called the From_
line. The From_
line contains the same identifier, along with the date and time (in UTC) that the message was received by Google; for example:
From 1463030154355209614-d7f2c19a-73f3-40e4-a17a-130b90c37aac.mbox@xxx Wed Mar 19 06:38:02 2014
The From_
line is the first entry for each message included in the mbox file. When you get to a new From_
line, you're looking at a different message.
Error reports
When Vault is unable to export data from a service, Vault generates an error report. The report lists the items with export errors along with more details and metadata.
Vault reports two types of errors:
- Transient errors—A backend server was unable to retrieve the email message or file. The item should be available for export when you search for it later.
- Non-transient errors—Any error that's not explicitly labeled as transient is the result of an issue that cannot be corrected. Typically these errors occur when a message attachment or file was deleted, isn't supported for export, or can't be converted to the requested format.
To determine if the problem is transient or non-transient, open the CSV file with Google Sheets or another spreadsheet app and find the Error Description column (Note: error descriptions aren't available for Voice exports).
Recover from transient errors
You can use message and file details to search for and export the data that wasn't exported due to transient errors:
- If the error report includes messages with transient errors, use each message’s RFC 822 identifier to find those specific messages when you search again. The format of the search term is rfc822msgid:identifier.
- If the error report includes Drive files with transient errors, use each file's title to find those specific files when you search again. The format of the search term is title:"title-of-file".
Error report contents
Error report contents for Gmail and GroupsThe error report contains the following fields for each message. Fields are blank if the data isn't available or applicable for a message.
Field |
Description |
---|---|
Document ID |
A unique identifier for the file |
Document type |
The document type. Value is mail . |
File type |
The file type. Value is |
Attachments count |
The number of attachments to the message |
Attachment names |
The file names of the attachments |
Subject |
The message subject |
Size |
The message size |
From |
The sender's email account |
To |
The email accounts of all recipients |
Cc |
The email accounts of all Cc'd recipients |
Sent time |
The timestamp for when the message was sent |
Source account |
The account that was included in the search query |
Error description |
A description of the error |
RFC 822 Message-ID |
A unique identifier for a message that's added by mail servers Example: |
The error report contains the following fields for each message. Fields are blank if the data isn't available or applicable for a message.
Field |
Description |
---|---|
Document ID |
A unique identifier for the file |
Filename |
The document type. Value is mail . |
Conversation Type |
The type of message. Value is |
Room Name |
The name of the room |
Error description |
A description of the error |
The error report contains the following fields for each file. Fields are blank if the data isn't available or applicable for a file.
Field | Description |
---|---|
Document ID |
A unique identifier for the file |
Document type |
Indicates the file type for Google files. Possible values are DOCUMENT, SPREADSHEET, PRESENTATION, FORM, and DRAWING. |
File type |
The file format, such as PDF or XLSX |
Title |
The filename as assigned by the user |
Size |
The size of the file |
Creator |
The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name. |
Collaborators |
The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export. |
Viewers |
The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export. |
Others |
The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export. |
Creation time |
The date a Google file was created in Drive. For non-Google files, this indicates when the file was uploaded to Drive. |
Last modified time |
The date the file was last modified |
Error description |
A description of the error |
Drive Document ID |
A unique identifier for a file in Drive |
The error report lists accounts that were searched but not all matching files were exported.
Field | Description |
---|---|
Account |
The email address of the account that some data wasn’t exported for. |
Failed Conversation Count |
The number of conversations that weren’t exported. If the number is unknown, the value is Unknown Failure Count . |