Translation & transcription glossary

ASR: Automatic Speech Recognition. YouTube uses automatic speech recognition to add automatic captions to videos. The feature is available in Arabic, Bengali, Bulgarian, Czech, Danish, Dutch, English, French, Farsi, Filipino, Finnish, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Slovak, Spanish, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, and Vietnamese. ASR is not available for all videos.
Automatic caption: Caption track created by Automatic Speech Recognition.
Caption: Used to refer to both same-language transcriptions and translated subtitles that show as text in a video. By default, "caption" refers to same-language transcriptions.
Closed caption: Closed captions depict in text the audio in a video. This content is primarily for hard-of-hearing and deaf viewers. Content includes a transcription of the spoken words, and sound cues, such as "[music playing]" or "[laughter]." Closed captions can also identify speaker, such as "Mike: Hey there!" or by using positioning on the screen.
Contribute: To create or edit metadata translations or a new caption track that's published to a video.
Contribution: A new or edited metadata translation, subtitle, or closed caption that is reviewed and published to a video.
Contributor: A volunteer who has submitted new subtitle content, closed caption content, or metadata translation; or who has edited or reviewed other contributors’ content.
Creator: Video uploader/owner.
Submit: To send a completed or partially written track for review to be published to a video.
Submission: The complete or partially written translation or transcription that is sent for review to be published to a video.
Subtitles: Text tracks that accompany a video in a different language than the one spoken in that video. This content is primarily for foreign-language viewers. Content is a translation of spoken words and written text that show at the bottom or below the video ("sub" titles).
Set timings: When someone submits a transcript, we use our sync server to automatically align the transcript with the video, creating a timed caption track.
Transcript: Unformatted (and untimed) text that's transcribed verbatim from the video.
Translation: Title, description, or subtitle that's created by translating existing metadata, subtitles, or closed captions.

Was this helpful?

How can we improve it?

Translation & transcription glossary

Was this helpful?

Need more help?

Try these next steps: