User Tools

Site Tools


doc:appunti:linux:video:subtitleripper

This is an old revision of the document!


How to rip DVD subtitles with vobsub2srt

The vobsub2srt program reads a pair of subtitles.sub and subtitles.idx files, OCRs the images contained in the sub file and creates a subtitles.srt file with the subtitles text and the appropriate timing information obtained from the idx file.

The program vobsub2srt does not exists in Debian 12 Bookworm, but it should be possible to compile it from source (see the VobSub2SRT GitHub repository). Alternatively you can get the binary package from the Deb Multimedia repository.

The required Debian packages are:

  • lsdvd - From the official Debian repository.
  • vobcopy - From the official Debian repository.
  • mkvtoolnix - From the official Debian repository.
  • vobsub2srt - From the Deb Multimedia repository.

Ripping the .vob from the DVD

vobcopy -n '1' -i /dev/sr0 --large-file -o .

Converting the .vob into .mkv format

As far I know, there is not a tool capable of extracting the VobSub subtitles directly from the vob file; we might hope that ffmpeg was capable of doing this, but it seems not.

Fortunately the mkvextract can extract the VobSub stream from a mkv file, so we firstly use ffmpeg to convert the vob into mkv. In the following example all the stream are copied, without re-encoding. At this step you may want to re-encode the video to squeeze the MPEG2 stream into the more efficient H264 format.

ffmpeg -probesize 500M -analyzeduration 500M \
    -i 'DVD_TITLE.vob' \
    -map 0:v:0 -map 0:a:0 -map 0:a:1 -map 0:s:0 -map 0:s:1 -map 0:s:2 \
    -vcodec 'copy' \
    -acodec 'copy' \
    -scodec 'copy' \
    'DVD_TITLE.mkv'

Notice the several -map options required to embed all the source streams into the destination file; in our example we have one video stream, two audio streams and three subtitles streams. The -probesize and -analyzeduration options are required because the subtitles streams start not at the very begin of the file and they may be missed.

Extracting .sub and .idx files from the .vob

mkvextract 'DVD_TITLE.mkv' tracks -c 'S_VOBSUB' '3:subtitles-3'

OCR the images from the .sub file

vobsub2srt --ifo './VTS_01_0.IFO' --dump-images --tesseract-lang ita subtitles

The .IFO file is required to get the correct palette, width and hight, but it is not mandatory.

doc/appunti/linux/video/subtitleripper.1706782405.txt.gz · Last modified: 2024/02/01 11:13 by niccolo