Translation & transcription glossary

  • ASR: Automatic Speech Recognition. YouTube uses automatic speech recognition to add automatic captions to videos (available in English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian and Spanish). ASR is not available for all videos.
  • Automatic caption: Caption track created by Automatic Speech Recognition.
  • Caption: Used to refer to both same-language transcriptions and translated subtitles that show as text in a video. By default, 'caption' refers to same-language transcriptions. 
  • Closed caption: Closed captions depict in text the audio in a video. This content is primarily for hard-of-hearing and deaf viewers. Content includes a transcription of the spoken words, as well as sound cues, such as '[music playing]' or '[laughter]'. Closed captions can also identify the speaker, such as 'Mike: Hey there!' or by using positioning on the screen.
  • Contribute: To create or edit metadata translations or a new caption track that's published to a video.
  • Contribution: A new or edited metadata translation, subtitle or closed caption that is reviewed and published to a video.
  • Contributor: A volunteer who has submitted new subtitle content, closed caption content or metadata translation; or who has edited or reviewed other contributors' content.
  • Creator: Video uploader/owner.
  • Submit: To send a completed or partially written track for review to be published to a video.
  • Submission: The complete or partially written translation or transcription that is sent for review to be published to a video.
  • Subtitles: Text tracks that accompany a video in a different language to the one spoken in that video. This content is primarily for foreign-language viewers. Content is a translation of spoken words and written text that are shown at the bottom or below the video ('sub' titles).
  • Set timings: When a user submits a transcript, we use our sync server to automatically align the transcript with the video, creating a timed caption track.
  • Transcript: Unformatted (and un-timed) text that's transcribed verbatim from the video.
  • Translation: Title, description or subtitle that's created by translating existing metadata, subtitles or closed captions.
Was this helpful?
How can we improve it?