User Tools

Site Tools


doc:appunti:linux:video:vobcopy

This is an old revision of the document!


Ripping the content of a DVD

How to rip the video and audio streams

This recipe uses the lsdvd and vobcopy programs, which are found in the Debian packages of the same names (verified on Debian 10 Buster).

First you need to list the content (chapters) of the video DVD:

lsdvd 
Disc Title: MY_DVD_1_DISC1
Title: 01, Length: 00:00:00.580 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
Title: 02, Length: 00:00:14.000 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
Title: 03, Length: 00:43:32.330 Chapters: 05, Cells: 05, Audio streams: 01, Subpictures: 01
Title: 04, Length: 00:42:12.320 Chapters: 05, Cells: 06, Audio streams: 01, Subpictures: 01
Title: 05, Length: 00:44:45.840 Chapters: 05, Cells: 06, Audio streams: 01, Subpictures: 01
Title: 06, Length: 00:41:55.370 Chapters: 05, Cells: 05, Audio streams: 01, Subpictures: 01
Longest track: 03

You can also list the content of an ISO image mounted elewhere:

lsdvd /mnt/

Then you can rip the required track (e.g. #3) as a single (large) file. The -i parameter will accept the DVD device name or the directory containing the DVD structure:

vobcopy -n 3 -i /dev/dvd --large-file -o ./dstdir

The resulting VOB file will contains also the subtitles, if any.

How to rip subtitles from the DVD

This recipe uses the mplayer, mencoder, vobsub2srt and tesseract-ocr programs, from the Debian packages of the same names (tested with Debian 10 Buster, vobsub2srt comes from the deb-multimedia repository). To improve OCR performance on subtitle images you may install the local language package for tesseract, eg. tesseract-ocr-ita for Italian.

Suppose that we have the DVD structure mounted under /mnt. If you have instead the physical disk, substitute /mnt with the DVD device in the commands.

We can use mplayer to identify the subtitles available into the track #1 (we identify subtitle SID #0):

mplayer -dvd-device /mnt dvd://1 -identify
...
ID_SUBTITLE_ID=0
ID_SID_0_LANG=it
number of subtitles on disk: 1
...

Suppose that the DVD track is #1 (specified via the dvd:// option) and the subtitle index is #0 (specified via the -sid option), use mencoder to extract the subtitle index file and the subtitle bitmaps file, in the following example the files will be vobsubs-it.idx and vobsubs-it.sub respectively:

mencoder -dvd-device /mnt dvd://1  \
    -nosound \
    -ovc 'copy' -o /dev/null \
    -ifo /mnt/VIDEO_TS/VTS_01_0.IFO \
    -sid 0 -vobsubout vobsubs-it

The .IFO file is required to know the palette to apply to the bitmaps.

The following command, working on the two files vobsubs-it.idx and vobsubs-it.sub, will do the OCR on each subtitle image using tesseract (it requires several minutes to run):

vobsub2srt \
    --ifo /mnt/VIDEO_TS/VIDEO_TS.IFO \
    --dump-images \
    --tesseract-lang ita \
    vobsubs-it

The result will be a vobsubs-it.srt text file, containing the subtitles text and timing information. If you want to keep one pgm image file for each subtitle, add the --dump-images option.

Converting a DVD with subtitles to MKV using ffmpeg

I got a rather complicate DVD to rip from, basically the problems are:

  • Subtitles are in dvdsub format (which is normal for DVD), which need palette info to be displayed correctly.
  • Some subtitles streams do not start at the beginning.
  • Languages of subtitles are not automatically detected. * Need to apply the original paletto to subtitles, otherwise they are rendered with wrong colors.

Inspect the disk

Using lsdvd directly on the DVD disk, you can see the video tracks, audio streams and subtitles availables:

lsdvd -s /dev/dvd
Disc Title: FREEDOMDOWNTIME
Title: 01, Length: 02:01:38.600 Chapters: 30, Cells: 30, Audio streams: 04, Subpictures: 24
        Subtitle: 01, Language: da - Dansk, Content: Undefined, Stream id: 0x20, 
        Subtitle: 02, Language: de - Deutsch, Content: Undefined, Stream id: 0x21, 
        ...
Title: 02, Length: 01:18:01.000 Chapters: 06, Cells: 06, Audio streams: 01, Subpictures: 00
Title: 03, Length: 00:00:24.066 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
Title: 04, Length: 00:00:09.800 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00

Rip the track

First of all I ripped the first track (the only one I'm really interested in) from the DVD into a directory:

vobcopy -n 1 -i /dev/dvd --large-file -o ./track1/

Using the mediainfo tool you can inspect the resulting vob file to verify that video, audio and text (subtitles) streams are the ones we expect.

Get subtitles palette info

Then I extracted the first (#0) dvdsub stream (there are 22!) from the DVD:

mencoder -dvd-device /dev/dvd dvd://1  \
    -nosound \
    -ovc 'copy' -o /dev/null \
    -ifo /mnt/VIDEO_TS/VTS_01_0.IFO \
    -sid 0 -vobsubout vobsubs-sid0

This command will produce two files: vobsubs-sid0.idx and vobsubs-sid0.sub. Actually we are just interested in the palette which is written into the idx file, it is something like this:

palette: d7410d, 101010, 0e00d7, d5ccc9, d4b1cb, aac5d0, abd3af, d5ff0c,
         d717cc, d6a80b, 8b02d6, 1dca41, 0d007f, 95679f, 8caa67, 783d3f

As an alternative you can get the the .IFO of the track (for the first track it is VTS_01_0.IFO), that file contains the palette info and can be used instead of the palette numbers.

Transcode with ffmpeg

Finally I launched the ffmpeg incantation:

ffmpeg -probesize 200M -analyzeduration 200M \
    -palette "$PALETTE" \
    -i "$SRC_FILE" \
    -map '0:v:0' -map '0:a:0' -map '0:a:1' \
    -metadata:s:a:0 title="English" -metadata:s:a:0 language=eng \
    -metadata:s:a:1 title="Comment" -metadata:s:a:1 language=eng \
    ...
doc/appunti/linux/video/vobcopy.1613726079.txt.gz · Last modified: 2021/02/19 10:14 by niccolo