This recipe uses the lsdvd and vobcopy programs, which are found in the Debian packages of the same names (verified on Debian 10 Buster).
First you need to list the content (chapters) of the video DVD:
lsdvd Disc Title: MY_DVD_1_DISC1 Title: 01, Length: 00:00:00.580 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00 Title: 02, Length: 00:00:14.000 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00 Title: 03, Length: 00:43:32.330 Chapters: 05, Cells: 05, Audio streams: 01, Subpictures: 01 Title: 04, Length: 00:42:12.320 Chapters: 05, Cells: 06, Audio streams: 01, Subpictures: 01 Title: 05, Length: 00:44:45.840 Chapters: 05, Cells: 06, Audio streams: 01, Subpictures: 01 Title: 06, Length: 00:41:55.370 Chapters: 05, Cells: 05, Audio streams: 01, Subpictures: 01 Longest track: 03
You can also list the content of an ISO image mounted elewhere:
Then you can rip the required track (e.g. #3) as a single (large) file. The -i parameter will accept the DVD device name or the directory containing the DVD structure:
vobcopy -n 3 -i /dev/dvd --large-file -o ./dstdir
The resulting VOB file will contains also the subtitles, if any.
This recipe uses the mplayer, mencoder, vobsub2srt and tesseract-ocr programs, from the Debian packages of the same names (tested with Debian 10 Buster,
vobsub2srt comes from the deb-multimedia repository). To improve OCR performance on subtitle images you may install the local language package for tesseract, eg. tesseract-ocr-ita for Italian.
Suppose that we have the DVD structure mounted under /mnt. If you have instead the physical disk, substitute
/mnt with the DVD device in the commands.
We can use mplayer to identify the subtitles available into the track #1 (we identify subtitle SID #0):
mplayer -dvd-device /mnt dvd://1 -identify ... ID_SUBTITLE_ID=0 ID_SID_0_LANG=it number of subtitles on disk: 1 ...
Suppose that the DVD track is #1 (specified via the dvd:// option) and the subtitle index is #0 (specified via the -sid option), use mencoder to extract the subtitle index file and the subtitle bitmaps file, in the following example the files will be vobsubs-it.idx and vobsubs-it.sub respectively:
mencoder -dvd-device /mnt dvd://1 \ -nosound \ -ovc 'copy' -o /dev/null \ -ifo /mnt/VIDEO_TS/VTS_01_0.IFO \ -sid 0 -vobsubout vobsubs-it
The .IFO file is required to know the palette to apply to the bitmaps. How to specify it from the physical device?
The following command will do the OCR on each subtitle image using tesseract (it requires several minutes to run):
vobsub2srt \ --ifo /mnt/VIDEO_TS/VIDEO_TS.IFO \ --dump-images \ --tesseract-lang ita \ vobsubs-it
The result will be a vobsubs-it.srt text file, containing the subtitles text and timing information. If you want to keep one pgm image file for each subtitle, add the --dump-images option.