« Back to home

Subtitles and Captions: Apples & Oranges or Tomato/Tomahto?

Our last post discussed the difference between our Timestamps upgrade and our Subtitles & Captions upgrade, and what you get with each.

We explained that, on our end, Subtitles & Captions are all one product; you order the upgrade, and we give you the whole range of subtitle and caption formats so you can download whichever one you need.

But what, in fact, is the difference between subtitles and captions? Once you have all those downloadable files, which do you choose? Let’s take a closer look.


What’s the difference between subtitles and captions?

Generally people say subtitles are for reading while also listening to the audio. They contain normal punctuation, and just consist of the words that are spoken.

Captions impart more information, such as speaker labels, laughter, applause, and other non-verbals. The idea is that captions can be used effectively with the volume turned off. Captions contain minimal punctuation.

If your primary goal is captioning, we suggest a verbatim transcript.

What do I do next?

Once your transcript is complete, you will need to select your subtitle or caption format, download the corresponding file, and sync it to your video.

If your video is on YouTube, download the transcript in either SRT or VTT format, then follow these instructions, selecting the “Upload a file” option, to sync to your video.

If your video is on Facebook, download the SRT format, then follow these instructions to sync.

How do you create these files from my transcript?

Your transcripts are always produced by human transcribers and editors, but we do harness the power of automation when creating subtitles and captions.

Your completed transcript goes through a speech recognition process to match up every word in the script with the corresponding spot in the audio, then export the results into the various subtitle and caption formats so you can pick the one that best meets your needs.

We try to fit between 32 and 40 characters on a line for SRT and VTT, and between 28 and 30 on each of two or three lines for most other formats. We try to show no more than 3.5 seconds of speech on screen at a time, and we don’t let text hang around more than 2 seconds after it’s been spoken.