Custom glossary format

Google Translator Toolkit supports bulk-upload of terminology for use with your translations. This terminology should be stored in a comma-separated value (CSV), UTF-8 format file with the following rows and columns:

header row

  • locale columns: The locale header column must contain a valid locale code (see Locale Codes below). At least one locale code is required for the glossary.
  • part-of-speech column (optional): The part-of-speech header column, when included, should contain only one value: pos.
  • description column (optional): The description header column, when included, should contain only one value: description.

data rows

  • terminology columns: Terminology in each row should correspond to the locale in the header. At least one term is required for a row to be valid.
  • part-of-speech column: The part-of-speech is an informational field that indicates the sense in which the terms in the row should be used. Sample parts-of-speech include adjective, adverb, noun, and verb.
  • description column: The description should provide any notes for the translator, including the meaning of the terms in the row.  

For example, a valid glossary is as follows: 

sample CSV

Special cases

  • Blank rows are ignored.
  • At least one term must be provided if a part-of-speech or description is included.
  • Commas in a cell must be escaped by double-quotes ("). For example, to add the term hello, world!, use "hello, world!". Quotes within quotes are escaped by \. For example, She said, "hello." should be entered as "She said, \"hello.\""

Locales

For a list of supported term locales, see the help article on locales.

UTF-8 format

UTF-8 is a data format that allows you to create glossaries in nearly every character format. While other character encodings, like ASCII or Windows 1252, only support a few languages, UTF-8 supports all languages, from Chinese to Russian to Spanish to Hindi to Arabic. In order to support a wide range of languages, Google Translator Toolkit requires that users upload glossaries only in UTF-8.

To create a CSV glossary in UTF-8 format, you should use a UTF-8 editor or a spreadsheet program that handles UTF-8. Here are a few ways that you can create one.

In Google Docs

  1. Create your glossary as a spreadsheet in Google Docs.
  2. Click File > Export > .csv Sheet only.
  3. Depending on your browser, there are different ways to save this CSV to your computer.
    • Google Chrome
      1. Right-click on the page and choose Save as...
      2. In the 'Save As' box, save the file to your desktop with a .csv extension.
    • Mozilla Firefox
      1. Click File > Save Page As....
      2. In the 'Save As' dialog, save the file to your desktop with a .csv extension.
    • Internet Explorer
      1. Click File > Save As....
      2. In the 'Save As' dialog, save as type Text File (*.txt) with an encoding Unicode (UTF-8). Do not give the file a .csv extension; doing so saves the page as HTML.
      3. When the file is saved on your computer, change the extension to .csv.

In OpenOffice Calc

  1. Create your glossary in Calc.
  2. Click File > Save As....
  3. In the 'Save As' dialog, save as type 'Text CSV (.csv)' and check the Edit filter settings box.
  4. If you're prompted to save the document regardless of formatting, click Yes to save in Text CSV file format.
  5. In the 'Export of text files' dialog, choose Unicode (UTF-8) character set with a comma Field delimiter and a double-quotes Text delimiter. Make sure the Save cell content as shown box is checked, and the Fixed column width box is not.
  6. Click OK to save the file.