After you use Google Vault to search for the data you want, you can export a copy of that data and download it for further analysis. An export contains the following information:
- A comprehensive copy of the data that matched your search criteria.
- The metadata you need to link the exported data to individual users in your organization.
- The corroborating information required to prove that the exported data matches the data stored on Google’s servers.
Gmail (new export)
You can now export Gmail messages using an improved system in Vault. The files exported by this system are a little different from the classic format.
If you exported in the classic format, go to Gmail (classic export).
Expand section | Close section & go to top
Export contentsHow Chat messages are organized in an export file
To provide context for Chat messages, your search results, preview, and export files include messages sent in the same conversation or thread as the matching messages as follows:
Message organization
In the export file, messages are grouped by conversation, thread, or Chat space.
If your export includes messages from several conversations, the conversations are in reverse chronological order based on the most recent matching message. For example, if the last matching message in one conversation was sent at 1 pm, and the last matching message in another conversation was sent at 8 pm, messages from the conversation with the 8 pm message are listed first in the export file, then the messages from the conversation with the 1 pm message.
Within each conversation or Chat space, messages are in chronological order.
Note: If you review your export in an email client, messages are in chronological order and aren’t grouped by conversation.
Reducing duplicate messages
When many messages in the same conversation or thread match your search, there can be overlap in the messages provided for context. To avoid exporting duplicate messages, Vault evaluates the overlap and groups messages accordingly.
For example, a conversation has 2 messages that match the search query, one sent at 9 am on Monday and the other sent at 3 pm on Monday. The matching messages have the following context windows:
- 9 am message: 9 pm Sunday to 9 pm Monday
- 3 pm message: 3 am Monday to 3 am Tuesday
The context windows overlap between 3 am and 9 pm Monday. When the messages are exported, only one series of messages is returned for the conversation and it includes all messages sent between 9 pm Sunday and 3 am Tuesday.
If another message in the conversation matches but was sent more than 24 hours after the previous matching message, then a second message grouping is created and returned in the export. Continuing the previous example, if another matching message was sent at 11 am Thursday, its context window is from 11 pm Wednesday to 11 pm Thursday. This window doesn’t overlap with the previous window, which ended at 3 am Tuesday, so the messages in this window are returned in a second group.
Information | File name | Description |
---|---|---|
Message contents | export_name-N.zip |
PST or mbox files with the contents and details of the exported messages. Learn about options for reviewing PST and mbox files. The message files are named export_name-account-randomstring.mbox or export_name-account-randomstring.pst, where account is the full email address of the custodian (the account that sent or received the message). If the messages for an account exceed 1 GB for PST files or 10 GB for mbox files, they're exported in zip files. The files within are appended with a 6-character randomized string. The zip files are numbered sequentially. For example, if there are more than 10 GB of mbox files in the export, my-export could contain the following files for user1@example.com: my-export-1.zip my-export-2.zip If you're exporting PST files, the extension of the additional files is .pst. To decrypt client-side encrypted messages in mbox files, use the decrypter utility (beta). To view client-side encrypted emails in the PST format, import each users’ p7m file into Microsoft Outlook. |
Message metadata | export_name-metadata.csv |
A CSV file that contains message metadata as it exists on Google servers. Open this file in a spreadsheet editor and use it to connect message metadata with the message contents from the mbox file. Note: PST file contents can’t be correlated with the XML file metadata. |
Accounts and message count | export_name-result-counts.csv |
A CSV file that lists the accounts of message owners included in the export, the number of messages owned by each account, and how many messages were successfully exported or had errors. |
Error report |
export_name-errors.xml |
An XML file that lists errors retrieving messages. It's always part of the export, even when no errors occurred. |
Messages that didn't convert to PST | export_name-conversion_errors-N.zip |
When you export in PST format, this file contains any messages that weren't converted to PST. Each message is a separate EML file named with the value of the You get many zip files when there are more than 10 GB of messages. |
File checksums | File checksums | A file that lists the message digest 5 (MD5) hash values for all files in the export. |
The metadata CSV file lists the following information for each message. The value is blank if the information isn't available or doesn't apply to a message.
Column | Description | Note |
---|---|---|
Rfc822MessageId |
A message ID that is the same for the receiver's and sender's messages. Use this value to correlate metadata with the message in an mbox export. |
|
GmailMessageId |
A unique message ID. Use this value to manage specific messages with the Gmail API. | |
Account |
The account that had the message in their Inbox. For example, user1@example.com received a message sent to groupA@example.com because they're a member of the group. If a search returns that message because it was in user1's Inbox, then the value of |
|
From |
The sender account. | |
To |
The recipient account. Multiple recipients are comma-separated and the list is in double quotes. | Gmail only |
CC |
Accounts in the cc: field. | Gmail only |
BCC |
Accounts in the bcc: field. | Gmail only |
Subject |
The message subject. | Gmail only |
Labels |
Labels applied to the message by Gmail or the user. | Gmail only |
DateSent |
The message send date in UTC (yyyy-MM-dd'T'HH:mm:ssZZZZ). | Gmail only |
DateReceived |
The message received date (yyyy-MM-dd'T'HH:mm:ssZZZZ). | Gmail only |
The count CSV file contains a list of the searched accounts and the number of messages in the export associated with each account.
The first row is for Totals
, which lists the total exported and errored messages for all emails in the export. Results are sorted in descending order of the number of successfully exported messages for that email address.
Note about counting
- If a message matches the export query but fails to convert to PST format, it's counted as a success in this file. You can review the messages that didn't convert in the export_name-conversion_errors-N.zip file.
Column | Description |
---|---|
Email |
The email address of the sender or recipient. |
AccountStatus |
If messages for the email account were successfully exported. Value can be the following:
|
SuccessCount |
The number of messages successfully exported. Equivalent to the count value in the classic count file. |
MessageErrorCount |
The number of messages that aren’t included in the export. These messages are identified in the error CSV file. |
Gmail (classic export), Chat, and Groups exports
Expand section | Close section & go to top
Export contentsInformation | File name | Description |
---|---|---|
Message contents | export_name-N.zip |
Zip files of PST or mbox files. These files contain the contents and details of the exported messages. For Google Chat messages, these details include when the sender edited or deleted a message. Learn about options for reviewing PST and mbox files. You get multiple zip files in the following scenarios:
The file names end with an increment to distinguish the files. To decrypt client-side encrypted messages in mbox files, use the decrypter utility (beta). To view client-side encrypted emails in the PST format, import each users’ p7m file into Microsoft Outlook. |
Google Groups membership information | export_name-group-membership.csv |
A CSV file that lists the following information for each group member:
|
Message metadata |
export_name-metadata.xml export_name-metadata.csv |
Note: PST file contents can’t be correlated with the XML file metadata. |
Accounts and message count | export_name-results-count.csv | A CSV file that lists the accounts of message owners included in the export and the number of messages owned by each account. |
Error reports |
error.csv export_name–account-exceptions.csv (Gmail exports) export_name–failed-group-membership-lookups.csv (Groups exports) |
Error reports are included only if the export encounters errors.
|
File checksums | File checksums | The file lists the message digest 5 (MD5) hash values for all files in the export. |
The metadata file contains the following information:
Included for Gmail and Groups messages
#From
—The email account of the sender#To
—The email accounts of all recipients#CC
—The email accounts of all Cc'd recipients#BCC
—The email accounts of all Bcc'd recipients#Subject
—The message subject#DateSent
—The timestamp for when the message was sent#DateReceived
—The timestamp for when the message was received
Included for Chat message
#DateFirstMessageSent
—The timestamp for when the first message in a conversation was sent#DateLastMessageSent
—The timestamp for when the last message in a conversation was sent#DateFirstMessageReceived
—The timestamp for when the first message in a conversation was received#DateLastMessageReceived
—The timestamp for when the last message in a conversation was received
Included for all messages (Gmail, Groups, and Chat)
Labels
—Any labels applied by Gmail or Chat, such as ^INBOX, ^TRASH, and ^DELETED. Also shows any labels applied to the message by the user.FileName
—A message identifier. Use this value to correlate metadata with the corresponding message in an email client or a text editor.FileSize
—The size of the message in bytes.Hash
—The MD5 hash of the message.
Included for Chat messages
RoomID
–The space, group chat, or DM identifier that the message belongs to.Participants
–The email addresses of all users who participated in the conversation.RoomName
–The value depends on the type of message:- For Chat spaces, the name of the space.
- For group conversations created after early December 2020,
Group chat
. - For group conversations created before early December 2020 and DMs, a comma-separated list of accounts that participated.
ConversationType
–The message type:- For a group chat created after early December 2020 or a space, the value is
Room
. - For a group chat created before December 2020, the value is
Group Direct Message
. - For a DM, the value is
1:1 Direct Message
.
- For a group chat created after early December 2020 or a space, the value is
Query parameters for the entire export
UserQuery
—The query submitted by the Vault user that retrieved the messages included in this export.TimeZone
—The time zone used for date-based searches.Custodians
—The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.
The metadata CSV file lists the following information for Gmail messages. The value is blank if the information isn't available or doesn't apply to a message.
Note: This file doesn't include metadata for Groups or Chat messages.
Column | Description | Note |
---|---|---|
Rfc822MessageId |
A message ID that is the same for the receiver's and sender's messages. Use this value to correlate metadata with the message in an mbox export. |
|
GmailMessageId |
A unique message ID. Use this value to manage specific messages with the Gmail API. | |
Account |
The account that had the message in their Inbox. For example, user1@example.com received a message sent to groupA@example.com because they're a member of the group. If a search returns that message because it was in user1's Inbox, then the value of |
|
From |
The sender account. | |
To |
The recipient account. Multiple recipients are comma-separated and the list is in double quotes. | Gmail only |
CC |
Accounts in the cc: field. | Gmail only |
BCC |
Accounts in the bcc: field. | Gmail only |
Subject |
The message subject. | Gmail only |
Labels |
Labels applied to the message by Gmail or the user. | Gmail only |
DateSent |
The message send date in UTC (yyyy-MM-dd'T'HH:mm:ssZZZZ). | Gmail only |
DateReceived |
The message received date (yyyy-MM-dd'T'HH:mm:ssZZZZ). | Gmail only |
Drive exports
Expand section | Close section & go to top
Export contentsInformation | File name | Description |
---|---|---|
Files | export_name_N.zip |
Contains all the files and sites found by your search. Vault exports up to 10 GB of data in a single compressed file. If you export more than 10 GB of data, Vault creates multiple files. Exported files are named with the original name of the file followed by an underscore ("_") and the Drive file ID. Exported Google files are converted as follows:
Note: When you export client-side encrypted files, the files remain encrypted and the filenames end with |
File metadata | export_name-metadata.xml |
Contains metadata, including:
|
Accounts and doc IDs | export_name-custodian-docid.csv | Lists user accounts with their associated document IDs. Use this information to determine which users have access to the exported files. |
Error reports |
error.csv export_name-incomplete-accounts.csv |
Error reports are included only if the export encounters errors.
|
File checksums | File checksums | The file lists the message digest 5 (MD5) hash values for all files in the export. |
The metadata file included with your export captures the following metadata:
Included with each file
DocID
—A unique identifier for the file. For sites exports, the value is the page ID.#Author
—The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name.Collaborators
—The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export.Viewers
—The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export.#DateCreated
—The date a Google file was created in Drive. For non-Google files, usually the date the file was uploaded to Drive. Learn more about timestamps for uploaded files.#DateModified
—The date the file was last modified. Learn more about timestamps for uploaded files.#Title
—The filename as assigned by the user. Because some operating systems can't expand zip files with extremely long filenames, Vault truncates the filename at 128 characters during export. The value shown by the#Title
tag isn't truncated.DocumentType
—The file type for Google files. Possible values are:DOCUMENT
—A document created in Google Docs.SPREADSHEET
—A spreadsheet created in Google Sheets.PRESENTATION
—A presentation created in Google Slides.FORM
—A form created in Google Forms.DRAWING
—A drawing created in Google Drawings.SITES_PAGE
—A page from a site created in new Google Sites.
Others
—The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export.SitesTitle
—For sites, the name of the page.PublishedURL
—For sites, the web address of the published page. Value is empty for unpublished sites.DocParentID
—For sites, a unique identifier for the site the page is part of.SharedDriveID
—The identifier of the shared drive that contains the file (if applicable).SourceHash
–A unique hash value for each version of a file. Can be used to deduplicate file exports and verify the exported file is an exact copy of the source file. Supported by Google Docs, Sheets, and Slides files only.FileName
—The file name. Use this value to correlate the metadata with the file in the export ZIP file.FileSize
—The size of the file in bytes.Hash
—The MD5 hash of the file.ClientSideEncrypted
—Indicates the file was encrypted with Google Workspace Client-side encryption. Files that aren't client-side encrypted don't include theClientSideEncrypted
tag.Reviews
—A section that lists metadata for file approvals. Not included when no approvals were requested on the file. For each approval request, aReview
section contains the following information:ApprovalId
—A unique identifier for the review.CreatedAt
—The time when approval was requested.ModifiedAt
—The last time the approval status changed.Approvers
—A comma-separated list of the approvers’ emails.ApprovalStatus
—The status of the approval. Possible values are:IN_PROGRESS
—Approval requested.APPROVED
—All approvers approved the file.DECLINED
—An approver declined the request to approve the file.CANCELLED
—An approver rejected the file.
Query parameters for the entire export
UserQuery
—The query submitted by the Vault user that retrieved the files included in this export.TimeZone
—The time zone used for date-based searches.Custodians
—The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.
When you export files from Drive, the metadata file may include information about users in your organization who have indirect access to, and have opened, a file that matches your search criteria.
A user can have indirect access when a file or folder containing a file is:
- Shared with a group the user belongs to
- Shared with the domain
- Shared publicly
During export, you can choose the information you want to include in the metadata output:
-
In the export dialog, check the box to have Vault determine the permission level for users in your domain who have indirect access to files. Each of these users is included in one of these categories when you open the metadata file:
Collaborators
—Users who have indirect permission to edit or add comments to a file.Viewers
—Users who have indirect permission to view a file.Others—
In some circumstances, Vault can't determine the type of access a user has at the time of export. This can happen, for example, if a file was shared with a group, and the user was later removed from the group.
Vault takes time to determine what permissions these users have, so this option can increase the time it takes to prepare your files for download.
-
In the export dialog, leave the box unchecked (default) to exclude access-level information for users in your domain with indirect access to files. These users are listed as
Others
in the metadata file.
Google Voice exports
Expand section | Close section & go to top
Export contentsInformation | File name | Description |
---|---|---|
Voice data files | export_name-N.zip | A zip file is generated for each account and contains PST or mbox files of text conversations, call logs, voicemail MP3 audio files, and voicemail transcriptions. |
File metadata | export_name-metadata.xml | An XML file that contains metadata as it exists on Google servers. |
File checksums | File checksums | A checksum file with message digest 5 (MD5) hash values for all files included in the export. |
Error report |
error.csv |
Error reports are included only if the export encounters errors. Learn more |
Note: Unlike other services, Voice exports don’t include a count file.
The metadata file contains the following information:
Information about each file
DocID
—A unique identifier for the file.#Author
—The email address of the account that owns the file in Drive.#DateFirstMessageSent
—For text conversations, the date the first message was sent. Note: this and the following 3 fields are identical in entries for voicemails and call logs.#DateLastMessageSent
—For text conversations, the date the last message was sent.#DateFirstMessageReceived
—For text conversations, the date the first message was received.#DateLastMessageReceived
—For text conversations, the date the last message was received.ConversationType
—The data type:TEXT_MESSAGE
—A text message.VOICEMAIL
—A voicemail.INCOMING_CALL
—A call log of an incoming call.OUTGOING_CALL
—A call log of an outgoing call.MISSED_CALL
—A call log of an unanswered incoming call.
ParticipantPhoneNumbers
—The phone numbers of the participants.OwnerPhoneNumbers
—The value includes multiple phone numbers when the user's number changed.Labels
—Any labels on the conversation. For example, deleted conversations have the DELETED label.ExternalFile FileName
—The file identifier, which correlates to the Subject in the PST or mbox file.
Query parameters for the entire export
UserQuery
—The query submitted by the Vault admin.TimeZone
—The time zone of the queryCustodians
—The email addresses of the accounts that were searched.
Error reports
When Vault is unable to export data from a service, Vault generates an error report. The report lists the items with export errors along with more details and metadata.
Vault reports two types of errors:
- Transient errors—A backend server was unable to retrieve the email message or file. The item should be available for export when you search for it later.
- Non-transient errors—Any error that's not explicitly labeled as transient is the result of an issue that cannot be corrected. Typically these errors occur when a message attachment or file was deleted, isn't supported for export, or can't be converted to the requested format.
To determine if the problem is transient or non-transient, open the CSV file with Google Sheets or another spreadsheet app and find the Error Description column (Note: error descriptions aren't available for Voice exports).
Recover from transient errors
You can use message and file details to search for and export the data that wasn't exported due to transient errors:
- If the error report includes messages with transient errors, use each message’s RFC 822 identifier to find those specific messages when you search again. The format of the search term is rfc822msgid:identifier.
- If the error report includes Drive files with transient errors, use each file's title to find those specific files when you search again. The format of the search term is title:"title-of-file".
Error report contents
Expand section | Close section & go to top
Error report contents for Gmail (new export)Summary section
The error report contains the following data for the entire export in a Summary
section.
Field | Description |
---|---|
AccountErrorsCount |
The number of accounts that Vault was unable to retrieve any messages for export. |
PartialAccountErrorsCount |
The number of accounts that Vault was unable to retrieve all messages for export. |
MessageErrorsCount |
The number of messages that Vault was unable to retrieve completely from Gmail. For these messages, Vault retrieves the metadata but not all the message content. |
Account |
The account that had the message in their Inbox. For example, user1@example.com received a message sent to groupA@example.com because they're a member of the group. If a search returns that message because it was in user1's Inbox, then the value of |
Count |
The number of errored messages associated with a specific account. |
PSTConversionErrorsCount |
The number of messages that weren’t converted to PST. |
Lists of errors
After the Summary
section, the export contains metadata for the accounts and messages that had errors. Values aren't reported if the data isn't available or applicable for a message.
Field | Description |
---|---|
AccountErrors |
A list of the users whose messages couldn't be searched. Each entry includes values for Account and Reason . |
Reason |
For accounts that Vault was unable to retrieve any messages for export, the error returned by Gmail. |
PartialAccountErrors |
A list of users whose messages were only partially searched. |
MessageErrors |
The metadata for messages that couldn't be exported. Fields are the same as the metadata file |
PSTConversionErrors |
For PST-formatted exports, a list of the Account and Rfc822MessageId values for messages that weren't converted for PST. These messages are available in their original EML format in the export_name-conversion-errors-N.zip file included in the export. |
The error report contains the following fields for each message. Fields are blank if the data isn't available or applicable for a message.
Field |
Description |
---|---|
Document ID |
A unique identifier for the file |
Document type |
The document type. Value is mail . |
File type |
The file type. Value is |
Attachments count |
The number of attachments to the message |
Attachment names |
The file names of the attachments |
Subject |
The message subject |
Size |
The message size |
From |
The sender's email account |
To |
The email accounts of all recipients |
Cc |
The email accounts of all Cc'd recipients |
Sent time |
The timestamp for when the message was sent |
Source account |
The account that was included in the search query |
Error description |
A description of the error |
RFC 822 Message-ID |
A unique identifier for a message that's added by mail servers Example: |
The error report contains the following fields for each message. Fields are blank if the data isn't available or applicable for a message.
Field |
Description |
---|---|
Document ID |
A unique identifier for the file |
Filename |
The document type. Value is mail . |
Conversation Type |
The type of message. Value is |
space Name |
The name of the space |
Error description |
A description of the error |
The error report contains the following fields for each file. Fields are blank if the data isn't available or applicable for a file.
Field | Description |
---|---|
Document ID |
A unique identifier for the file |
Document type |
Indicates the file type for Google files. Possible values are DOCUMENT , SPREADSHEET , PRESENTATION , FORM , DRAWING , and SITES_PAGE . |
File type |
The file format, such as PDF or XLSX |
Title |
The filename as assigned by the user |
Size |
The size of the file |
Creator |
The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name. |
Collaborators |
The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export. |
Viewers |
The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export. |
Others |
The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault couldn't determine permission levels at the time of export. |
Creation time |
The date a Google file was created in Drive. For non-Google files, this indicates when the file was uploaded to Drive. |
Last modified time |
The date the file was last modified |
Error description |
A description of the error |
Drive Document ID |
A unique identifier for a file in Drive |
The error report lists accounts that were searched but not all matching files were exported.
Field | Description |
---|---|
Account |
The email address of the account that some data wasn’t exported for. |
Failed Conversation Count |
The number of conversations that weren’t exported. If the number is unknown, the value is Unknown Failure Count . |
Google, Google Workspace, and related marks and logos are trademarks of Google LLC. All other company and product names are trademarks of the companies with which they are associated.