User Tools

Site Tools


doc:appunti:linux:video:subtitleripper

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
doc:appunti:linux:video:subtitleripper [2024/02/01 11:03] – [Converting the .vob into .mkv format] niccolodoc:appunti:linux:video:subtitleripper [2024/02/01 11:56] (current) – [How to rip DVD subtitles with vobsub2srt] niccolo
Line 9: Line 9:
   * **lsdvd** - From the official Debian repository.   * **lsdvd** - From the official Debian repository.
   * **vobcopy** - From the official Debian repository.   * **vobcopy** - From the official Debian repository.
 +  * **mediainfo** - From the official Debian repository.
   * **mkvtoolnix** - From the official Debian repository.   * **mkvtoolnix** - From the official Debian repository.
   * **vobsub2srt** - From the Deb Multimedia repository.   * **vobsub2srt** - From the Deb Multimedia repository.
  
 ===== Ripping the .vob from the DVD ===== ===== Ripping the .vob from the DVD =====
 +
 +A DVD can contain several **titles** and you should identify which one you want to rip; generally it is the longer one or the one with most chapters. We check the DVD content using the **lsdvd** tool:
 +
 +<code>
 +lsdvd /dev/sr0
 +Disc Title: DVD_TITLE
 +Title: 01, Length: 01:02:36.480 Chapters: 03, Cells: 03, Audio streams: 02, Subpictures: 04
 +Title: 02, Length: 00:00:12.800 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
 +Title: 03, Length: 00:21:01.760 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
 +Title: 04, Length: 00:00:00.480 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
 +Title: 05, Length: 00:21:10.000 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
 +Title: 06, Length: 00:20:24.720 Chapters: 01, Cells: 01, Audio streams: 02, Subpictures: 04
 +Longest track: 01
 +</code>
 +
 +The longest title is the **#1**, so we will extract it using **vobcopy**:
  
 <code bash> <code bash>
 vobcopy -n '1' -i /dev/sr0 --large-file -o . vobcopy -n '1' -i /dev/sr0 --large-file -o .
 </code> </code>
 +
 +The resulting file will be saved into the working directory (as specified by the **%%-o%%** option) and it will be named by the DVD title, something like **DVD_TITLE.vob**.
 +
 +You can inspect the content of the file using the **mediainfo** tool, in our case the file contains one video stream, two audio streams and three subtitle streams. The subtitles are in the standard DVD format: VobSub, which is a images (bitmap) format, not text.
 +
  
 ===== Converting the .vob into .mkv format ===== ===== Converting the .vob into .mkv format =====
Line 22: Line 44:
 As far I know, there is not a tool capable of extracting the VobSub subtitles directly from the vob file; we might hope that **ffmpeg** was capable of doing this, but it seems not. As far I know, there is not a tool capable of extracting the VobSub subtitles directly from the vob file; we might hope that **ffmpeg** was capable of doing this, but it seems not.
  
-Fortunately the **mkvextract** can extract the VobSub stream from a //mkv// file, so we firstly use ffmpeg to convert the //vob// into //mkv//. In the following example all the stream are copied, without re-encoding. At this step you may want to re-encode the video  to squeeze the MPEG2 stream into the more efficient H264 format.+Fortunately the **mkvextract** (from the mkvtoolnix Debian package) can extract the VobSub stream from a //mkv// file, so we firstly use ffmpeg to convert the //vob// into //mkv//. In the following example all the stream are copied, without re-encoding. At this step you may want to re-encode the video  to squeeze the MPEG2 stream into the more efficient H264 format.
  
 <code bash> <code bash>
Line 34: Line 56:
 </code> </code>
  
-Notice the severla **%%-map%%** options required to embed all the source streams into the destination file; in our example we have one **video** stream, two **audio** streams and three **subtitles** streams.+Notice the several **%%-map%%** options required to embed all the source streams into the destination file; in our example we have **one video** stream, **two audio** streams and **three subtitles** streams. The **%%-probesize%%** and **%%-analyzeduration%%** options are required because the subtitles streams start not at the very begin of the file and they may be missed.
  
 ===== Extracting .sub and .idx files from the .vob ===== ===== Extracting .sub and .idx files from the .vob =====
 +
 +From the //mkv// file it is now possibile to create **two files** (.sub and .idx) for each subtitles stream. The stream numbering expected by ''mkvextract'' in our example is as follow: **#0** is the video stream, **#1** and **#2** are the two audio streams, so the first subtitle stream is the **#3**:
  
 <code bash> <code bash>
 mkvextract 'DVD_TITLE.mkv' tracks -c 'S_VOBSUB' '3:subtitles-3' mkvextract 'DVD_TITLE.mkv' tracks -c 'S_VOBSUB' '3:subtitles-3'
 </code> </code>
 +
 +The result will be two files: **subtitles-3.sub** and **subtitles-3.idx**. It is possible to repeat the command to extract the other subtitles (**#4** and **#5** in our example).
  
 ===== OCR the images from the .sub file ===== ===== OCR the images from the .sub file =====
  
 <code bash> <code bash>
-vobsub2srt --ifo './VTS_01_0.IFO' --dump-images --tesseract-lang ita subtitles+vobsub2srt --ifo './VTS_01_0.IFO' --dump-images --tesseract-lang ita 'subtitles-3'
 </code> </code>
  
 The .IFO file is required to get the correct palette, width and hight, but it is not mandatory. The .IFO file is required to get the correct palette, width and hight, but it is not mandatory.
  
doc/appunti/linux/video/subtitleripper.1706781830.txt.gz · Last modified: 2024/02/01 11:03 by niccolo