User Tools

Site Tools


doc:appunti:linux:video:vobcopy

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
doc:appunti:linux:video:vobcopy [2021/02/16 16:09] – [How to rip subtitles from the DVD] niccolodoc:appunti:linux:video:vobcopy [2021/02/19 17:13] (current) – [Converting a DVD with subtitles to MKV using ffmpeg] niccolo
Line 60: Line 60:
 </code> </code>
  
-FIXME The .IFO file is required to know the palette to apply to the bitmaps. How to specify it from the physical device?+The .IFO file is required to know the palette to apply to the bitmaps.
  
-The following command will do the OCR on each subtitle image using **tesseract** (it requires several minutes to run):+The following command, working on the two files **vobsubs-it.idx** and **vobsubs-it.sub**, will do the OCR on each subtitle image using **tesseract** (it requires several minutes to run):
  
 <code> <code>
Line 73: Line 73:
  
 The result will be a **vobsubs-it.srt** text file, containing the subtitles text and timing information. If you want to keep **one pgm image** file for each subtitle, add the **%%--dump-images%%** option. The result will be a **vobsubs-it.srt** text file, containing the subtitles text and timing information. If you want to keep **one pgm image** file for each subtitle, add the **%%--dump-images%%** option.
 +
 +===== Converting a DVD with subtitles to MKV using ffmpeg =====
 +
 +I got a rather complicate DVD to rip from, basically the problems are:
 +
 +  * Subtitles are in **dvdsub** format (which is normal for DVD), which need **palette** info to be displayed correctly.
 +  * Different subtitles streams **start at different times**, some do start **after several minutes**. The automatic detection performed by ''ffmpeg'' does not detect some of them and gets the sorting wrong.
 +  * **Languages of subtitles** are not automatically detected.
 +
 +=== Inspect the disk ===
 +
 +Using lsdvd directly on the DVD disk, you can see the **video** tracks, **audio** streams and **subtitles** availables:
 +
 +<code>
 +lsdvd -s /dev/dvd
 +Disc Title: FREEDOMDOWNTIME
 +Title: 01, Length: 02:01:38.600 Chapters: 30, Cells: 30, Audio streams: 04, Subpictures: 24
 +        Subtitle: 01, Language: da - Dansk, Content: Undefined, Stream id: 0x20, 
 +        Subtitle: 02, Language: de - Deutsch, Content: Undefined, Stream id: 0x21, 
 +        ...
 +Title: 02, Length: 01:18:01.000 Chapters: 06, Cells: 06, Audio streams: 01, Subpictures: 00
 +Title: 03, Length: 00:00:24.066 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
 +Title: 04, Length: 00:00:09.800 Chapters: 01, Cells: 01, Audio streams: 01, Subpictures: 00
 +</code>
 +
 +=== Rip the track ===
 +
 +First of all I **ripped the first track** (the only one I'm really interested in) from the DVD into a directory:
 +
 +<code>
 +vobcopy -n 1 -i /dev/dvd --large-file -o ./track1/
 +</code>
 +
 +Using the **mediainfo** tool you can inspect the resulting vob file to verify that **video**, **audio** and **text** (subtitles) streams are the ones we expect.
 +
 +=== Get subtitles palette info ===
 +
 +Then I extracted the first (#0) **dvdsub stream** (there are 22!) from the DVD:
 +
 +<code>
 +mencoder -dvd-device /dev/dvd dvd:// \
 +    -nosound \
 +    -ovc 'copy' -o /dev/null \
 +    -ifo /mnt/VIDEO_TS/VTS_01_0.IFO \
 +    -sid 0 -vobsubout vobsubs-sid0
 +</code>
 +
 +This command will produce two files: **vobsubs-sid0.idx**  and **vobsubs-sid0.sub**. Actually we are just interested in the **palette** which is written into the idx file, it is something like this:
 +
 +<file>
 +palette: d7410d, 101010, 0e00d7, d5ccc9, d4b1cb, aac5d0, abd3af, d5ff0c,
 +         d717cc, d6a80b, 8b02d6, 1dca41, 0d007f, 95679f, 8caa67, 783d3f
 +</file>
 +
 +As an alternative you can get the the **.IFO** of the track (for the first track it is **VIDEO_TS/VTS_01_0.IFO**), that file contains the palette info and can be used instead of the palette numbers.
 +
 +Also **lsdvd** should be able to print the palette, using the option **-P**. But in my tests it produced a palette with different color values, which displayed incorrectly in the final result.
 +
 +=== Transcode with ffmpeg ===
 +
 +Finally I launched the **ffmpeg** incantation:
 +
 +<code bash>
 +ffmpeg -probesize 500M -analyzeduration 500M \
 +    -palette 'd7410d,101010,0e00d7,d5ccc9,d4b1cb,aac5d0,abd3af,d5ff0c,d717cc,d6a80b,8b02d6,1dca41,0d007f,95679f,8caa67,783d3f' \
 +    -i 'FREEDOMDOWNTIME1.vob' \
 +    -map '0:v:0' -map '0:a:0' -map '0:a:1' \
 +    -map '0:s:20' \
 +    -map '0:s:0'  -map '0:s:1'  -map  '0:s:2' -map  '0:s:3' \
 +    -map '0:s:4'  -map '0:s:5'  -map  '0:s:6' -map  '0:s:7' \
 +    -map '0:s:8'  -map '0:s:9'  -map '0:s:10' -map '0:s:11' \
 +    -map '0:s:12' -map '0:s:13' -map '0:s:14' -map '0:s:15' \
 +    -map '0:s:16' -map '0:s:17' -map '0:s:18' -map '0:s:19' \
 +    -map '0:s:21' -map '0:s:22' -map '0:s:23' \
 +    -metadata:s:a:0 title='English' -metadata:s:a:0 language=eng \
 +    -metadata:s:a:1 title='English Commented' -metadata:s:a:1 language=eng \
 +    -metadata title='Freedom Downtime' -metadata:s:v:0 title='Freedom Downtime' \
 +    -metadata:s:s: language=eng -metadata:s:s: title='English' \
 +    -metadata:s:s: language=eng -metadata:s:s: title='English FCC-Approved' \
 +    -metadata:s:s: language=dan -metadata:s:s: title='Dansk' \
 +    -metadata:s:s: language=deu -metadata:s:s: title='Deutsch' \
 +    -metadata:s:s: language=spa -metadata:s:s: title='Espanol' \
 +    -metadata:s:s: language=est -metadata:s:s: title='Estonian' \
 +    -metadata:s:s: language=per -metadata:s:s: title='Persian' \
 +    -metadata:s:s: language=fin -metadata:s:s: title='Suomi' \
 +    -metadata:s:s: language=fra -metadata:s:s: title='Francais' \
 +    -metadata:s:s: language=heb -metadata:s:s: title='Hebrew' \
 +    -metadata:s:s:10 language=hrv -metadata:s:s:10 title='Hrvatski' \
 +    -metadata:s:s:11 language=ita -metadata:s:s:11 title='Italiano' \
 +    -metadata:s:s:12 language=jpn -metadata:s:s:12 title='Japanese' \
 +    -metadata:s:s:13 language=nld -metadata:s:s:13 title='Nederlands' \
 +    -metadata:s:s:14 language=nor -metadata:s:s:14 title='Norsk' \
 +    -metadata:s:s:15 language=pol -metadata:s:s:15 title='Polish' \
 +    -metadata:s:s:16 language=por -metadata:s:s:16 title='Portugues' \
 +    -metadata:s:s:17 language=rus -metadata:s:s:17 title='Russian' \
 +    -metadata:s:s:18 language=swe -metadata:s:s:18 title='Svenska' \
 +    -metadata:s:s:19 language=tur -metadata:s:s:19 title='Turkish' \
 +    -metadata:s:s:20 language=zho -metadata:s:s:20 title='Chinese' \
 +    -metadata:s:s:21 language=xxx -metadata:s:s:21 title='Babel nonsense' \
 +    -metadata:s:s:22 language=xxx -metadata:s:s:22 title='Game' \
 +    -metadata:s:s:23 language=xxx -metadata:s:s:23 title='Words' \
 +    -codec:s 'dvdsub' \
 +    -vf yadif \
 +    -codec:v 'libx264' -pix_fmt 'yuvj420p' -preset 'veryslow' -tune 'film' -profile:v 'high' -level:v 5 \
 +    -b:v '2048k' \
 +    -ac 2 -codec:a 'libvorbis' -b:a '192k' \
 +    'FREEDOMDOWNTIME1.mkv'
 +</code>
 +
 +Without the **%%-probesize%%** and **%%-analyzeduration%%** options (both are required), ''ffmpeg'' does not see the subtitles streams that starts some time after the begin of the video. If you explicitly map the unseen stream it will produce an error like this:
 +
 +<code>
 +Stream map '0:s:20' matches no streams.
 +</code>
 +
 +If you don't explicitly map the streams, you will get only a warning message during the transcode: 
 +
 +<code>
 +New subtitle stream 0:27 at pos:8284174 and DTS:20.0200s
 +</code>
 +
 +I mapped (i.e. selected to be inserted into the output) the **video track**, then **two adio tracks** (there were four), and finally **24 text subtitles tracks** (they are actually bitmaps in dvdsub format). The order of the **%%-map%%** options is used to re-arrange the position of the subtitles, overriding the autodetect performed by ''ffmpeg''. All the **%%-metadata%%** are used to properly tag the subtitles once they are sorted as I want.
 +
 +It is mandatory to use the **%%-codec:s 'dvdsub'%%** for subtitles, if you use the **copy** option (which does not re-encode the stream) the **palette** is not applied and you will get subtitles with wrong colors.
 +
 +Yes, the source video has annoying **interlacing artifacts**, so I used the **yadif** video filter to apply a deinterlace effect.
 +
doc/appunti/linux/video/vobcopy.1613488151.txt.gz · Last modified: 2021/02/16 16:09 by niccolo