Label Google Drive files automatically using AI classification

Supported add-ons for this feature: Gemini Enterprise and AI Security. Compare add-ons

Important: AI classification exited beta on April 9, 2024. It will take 2-3 weeks for the beta label to be removed in Admin console.

The AI classification feature uses artificial intelligence (AI) to automatically label your organization’s sensitive content. After an initial training period during which the AI model learns your organization's criteria for sensitive content, AI classification can automatically apply labels to both new and existing files in Drive.

Here's how to get started using AI classification:

1) Set up training To get started, you create the classification label, which the AI model will automatically apply to files once training is done. You also create the training label - a label that's nearly identical to the classification label.

2) Train the model During the training period, typically about a week, your designated labelers—users at your organization who can evaluate sensitive files—begin classifying Drive files with the training label. From their examples, your model begins learning how to similarly classify sensitive files.

3) Turn on automatic classification Once the model is trained (after about a week), you're prompted to turn on automatic classification. You can monitor how many files are classified, and how accurately, on an ongoing basis.

For exact details on each phase, see the linked sections below.

Before you begin

If you’re not familiar with Drive labels, see Manage Drive labels for details on how they work and how to create them.
Turn labels on for your organization:
1. Sign in to your Google Admin console.
  Sign in using your administrator account (does not end in @gmail.com).
2. In the Admin console, go to Menu AppsGoogle WorkspaceDrive and Docs.
3. Click Labels.
4. Turn labels on.
5. Click Save.
For best results, create a configuration group for your designated labelers, separate from the rest of your organization. For instructions, see Customize service settings with configuration groups.

Set up training

Create the classification label

The classification label is the label the AI model will automatically apply to your sensitive Drive files, after the model is trained. We recommend using a badged label, which shows prominently on documents. For more information on badged labels, see Get started as a Drive labels admin.

A badged label displays next to the title of a file

When used as a classification label, a badged label must meet these requirements:

Have only one field, of the Options list field type
Have a minimum of 2, and a maximum of 4 options
Must be published

If you have an existing badged label that meets these requirements, you can use it as a classification label. Otherwise, follow these steps and choose the badged label option.

Create the training label

We recommend that you create the training label during label selection (next step), when you can create it automatically. This guarantees the training label will match the classification label in all the required ways.

If you choose to create the training label before label selection:

Make sure the label meets the required label criteria.
Identify the training label with the word 'training' to make it easier for your trusted labelers to recognize the label and apply it during the training period.
Add a description field to the training label to further help trusted labelers understand its purpose.

Select labels and enable training

Sign in to your Google Admin console.
Sign in using your administrator account (does not end in @gmail.com).
In the Admin console, go to Menu SecurityAccess and data controlData classification.
In AI classification for Google Drive, click Set up training.
Under Select classification label, click Select Label.
Select the badged label you created in Create the classification label, above.
Under Select training label, click, Create training label.
This automatically creates a training label with the same attributes as your Classification label.
To make sure the new label is available to your designated labelers, Click Update label permissions. This opens the label in Edit mode in label manager, in a separate tab.
Note: You can also set label permissions later. But it’s important that only your labelers have access to the training label.
Click PermissionsEdit, then grant the Can apply labels and set values permission to the configuration group that contains your labelers.
Click Save and close the label manager tab.
After selecting both the classification label and training label, the Enable training button is enabled.
Click Enable training.
Important: If you get an error message when you try to enable training, it means your classification label and training label don’t match. Review the label requirements below and make sure your labels meet all requirements, then enable training.

After you enable training, the Data classification page shows your selected Training label and Classification label.

The Classification label shows Not ready. After training is done, the label status changes to Ready.
Auto apply status shows Off for everyone. Once the Classification label status is Ready, you can then change the Auto apply status to On.

Next, your designated labelers need to start applying the Training label to your sensitive files.

Train the model

To successfully train the AI model, your designated labelers should label at least 100 files per option. For example, if your label has 3 options, it should be applied to at least 300 files in total. The AI model checks training every 1-2 weeks and shows Ready once it has 100 or more examples for each label option. Learn more about high-quality examples.

During the training period, you can check progress for how many files have been labeled, and how the accuracy of the model is improving.

Note: Training files have a 1 million total limit.

To check progress during the training period:

In your Admin console, go to SecurityData classification.
Click View model details.
- Under Training label, Training files shows the number of files that have been labeled for each option.
- Each label option has a Score. This shows the model’s rating of its own accuracy in applying that option to files. Learn more about accuracy scores.

Turn on auto-apply of labels

After the AI model is trained to achieve a high level of accuracy, you’re ready to choose label options and turn on auto-applying of labels. Follow these steps:

In your Admin console, go to SecurityData classification.
In AI Classification, verify that the Classification label shows a status of Ready.
Click View model details.
Under Classification label, check the boxes for the label options you want to allow the AI model to auto-apply.
Click Turn on auto-apply.
Search for and select the organizational unit or group to include those user members to automatically apply labels for. For example, if you select the group, Finance, you can then select the labels to be configured for Finance.
Click On - Label is auto-applied.
Options for how the label is applied are listed under the On option.
Click Save.
On the Data classification main page, the Auto-apply status for the rule changes to On.

Monitor AI classification label events in the Drive log

You can get specific details on how AI classification is labeling files by looking at events recorded in the Drive log.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
Click View logs.
The Security Investigation Tool opens in a new tab, showing search results for the Drive log for two AI Classification-related events: Label applied and Label field value changed.
Click on the event Description to get additional details, such as:
- Name and type of the document that was labeled
- Label field value assigned to the document (for example Confidential or Restricted, if those are your label options).

Turn off auto-apply of labels

You can turn off auto-apply of all labels, or turn off specific options.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
- Under Classification label, uncheck Allow in the Auto-apply column to pause auto-apply for that option.
- To completely pause auto-apply, uncheck all options.

Turn off auto-apply completely for specific organizational units or groups

Use this option if you want to turn auto-apply completely off for content owned by users in specific organizational units or groups.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
Click Manage auto-apply.
Click an organizational unit or group at left to select it.
In Manage AI auto-apply, click OFF.

Reset the model

At some point you may need to reset the model, for example to start a another test or because model accuracy is not improving. If you need to reset the model, please note the following:

If you replace labels, you’ll need to go through a new training process before the new classification label can be turned on and applied automatically to files.
Previously applied training labels will remain on files, but won’t impact the new training process.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
On the AI model details page, under Actions at right, click Reset model.
The Reset model dialog lists the effects of resetting the model.
To continue, click Reset model.
AI classification is reset to its initial state. To restart, click Set up training and pick new classification and training labels.

FAQ

What are the requirements for the training label and classification label?

The classification label and the training label must both meet these criteria::

Have only one field, of the Options list field type.
Have a minimum of 2, and a maximum of 4 options.
The 2-4 options must be in the same order in each label. If the classification label has options in this order:
- 1. Option 1
- 2. Option 2
- 3. Option 3
The training label options can’t be ordered as follows:
- 1. Option 2
- 2. Option 1
- 3. Option 3
Both labels must be published.
The labels should have different access permissions. The training label should be available only to designated labelers who can be trusted to train the model. The classification label can have have a broader access setting.

Can I use the classification label as the training label?

No. The classification label and the training label must be two separate labels. The label you choose as your classification label will not display as a selectable choice for the training label.

What are good files for the model to train on?

For best results in training the model, have your trusted labelers follow these guidelines when choosing training files:

Each file must have a minimum of ~500 text characters.
Select files that best represent actual content that your users create, share, and use in your organization
Select roughly the same amount of files per label option, with a minimum of 100 files for each option. This helps the model gain a comprehensive understanding of your data and improve scores.
Include a representative variety of files for each option type. For example, don't label 100 resumes as your total set of example files for Top Secret, if contracts are also a common Top Secret file type in your organization.

Does AI classification only work for labeling sensitive content?

Sensitive content is the primary focus for AI classification, but any label with up to 4 options can be trained for automatic labeling.

How are scores calculated?

During training the AI model uses 75% of the input data to train itself on how to label files, and reserves 25% to periodically test its own performance. In other words, for 25% of the labeled files, the model analyzes those files as if it didn’t know what label had been applied. It then makes its own label choice, and compares that choice with the actual label applied by the designated labeler. The scores show what proportion of the reserved files it correctly assigned the right label.

How are the AI-labeled files totals computed?

To be included in the file totals under AI-labeled files, a file must be automatically labeled by AI classification and not subsequently edited or accepted by a user. For example, if your model auto-labels a file as Confidential, and no further action is taken on that label (such as a user accepting, changing, or removing the label), that file is counted in the AI-labeled total. If a user accepts the label, or changes the Confidential label to Internal after the initial AI classification, that file is no longer included in the AI-labeled files totals.

Where can I learn more about how AI classification works?

You can learn more about Google's approach to data classification and how AI classification works for Drive in the Google Workspace AI Classification whitepaper.

Was this helpful?

How can we improve it?