User Tools

Site Tools


doc:appunti:linux:video:ripping_dvds_with_mencoder

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
doc:appunti:linux:video:ripping_dvds_with_mencoder [2017/10/12 09:30] – [Extracting the subtitles] niccolodoc:appunti:linux:video:ripping_dvds_with_mencoder [2020/04/21 17:05] (current) – [OCRing] niccolo
Line 1: Line 1:
 ====== Ripping DVDs with Mencoder ====== ====== Ripping DVDs with Mencoder ======
  
 +:!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see **[[vobcopy]]**.
 ===== Install the necessary programs ===== ===== Install the necessary programs =====
  
Line 199: Line 200:
 Now, we skip the first pass of the video encode, and remove the ''vpass=2'' option from the mencoder command. You must make the same change to ''-oac'' as for two-pass. Now, we skip the first pass of the video encode, and remove the ''vpass=2'' option from the mencoder command. You must make the same change to ''-oac'' as for two-pass.
  
-===== Subtitles =====+===== Extract Subtitles with transcode ===== 
 + 
 +FIXME The following programs are **missing in Debian 10 Buster**: **tcextract**, **subtitle2vobsub** and **subtitle2pgm**. We are searching for some alternatives.
  
 DVDs have subtitles stored as images. There are some options for dealing with them: DVDs have subtitles stored as images. There are some options for dealing with them:
Line 228: Line 231:
 </code> </code>
  
-Now use transcode to extract them:+The **tccat** command will concatenate all the files that compose the specified ''$TITLE'' to the standard output. Files are taken from the directory where the DVD-Video was ripped (''$RIPDIR''). 
 + 
 +The **tcextract** command extract the requested stream; //ps1// stands for MPEG private stream (subtitles), the source type (''-t vob'') must be specified when reading from standard input. 
 + 
 +**NOTICE**The number **0x21** is **0x20** + the subtitle ID.
  
 <code> <code>
-tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x22 subs-en+tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 subtitles_stream.ps1
 </code> </code>
  
-where 0x22 is 0x20 + the subtitle ID.+If you have just the .VOB files, you can use this recipe: 
 + 
 +<code> 
 +cat VTS_02_?.VOB | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1 
 +</code>
  
-If you want vobsub files:+Use the **[[subtitleripper]]** scripts to obtain the VobSub files:
  
 <code> <code>
-subtitle2vobsub -o vobsubs-en -i $RIPDIR/VIDEO_TS/VTS_01_0.IFO < subs-en+subtitle2vobsub -p subtitles_stream.ps1 -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO -o subtitles
 </code> </code>
  
 +We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the [[glossary#vobsub|VobSub]] format; two files will be generated: **subtitles.idx** and **subtitles.sub**.
 +
 +If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the **-e** option, to indicate the **start**, the **end** and a **new_start** (new time offset) of the extraction, in **seconds**, like this:
 +
 +<code>
 +subtitle2vobsub -p subtitles_stream.ps1 \
 +    -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO \
 +    -e 9673.914,12673,0 -o subtitles
 +</code>
 ==== OCRing ==== ==== OCRing ====
  
Line 247: Line 267:
  
 <code> <code>
-subtitle2pgm -o english -c 255,0,0,255 < subs-en+cat subtitles_stream.ps1 | subtitle2pgm
 </code> </code>
  
-Each subtitle should now be one pgm fileand a srtx file will be created to index them and their times on-screen.+If you want to control how grey levels are convertedtry to use the **%%-c%%** option of subtitle2pgm, something like: **%%-c 255,0,0,255%%**.
  
-Now to ocr all that with gocr (using a nice wrapper for the job):+Each subtitle should now be one file named like **movie_subtitle0003.pgm**, and a **movie_subtitle.srtx** file will be created to index them and their times on-screen. 
 + 
 +=== With Tesseract OCR === 
 + 
 +<code bash> 
 +#!/bin/sh 
 +find . -type f -name '*.pgm' | sort | while read file; do 
 +    echo -n "$(basename $file) " 
 +    tesseract -l eng --psm 4 "$file" "$file" 
 +done 
 +</code> 
 + 
 +=== With Gocr === 
 + 
 +**NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead. 
 + 
 +To ocr all the .pgm image with **gocr** (using a nice wrapper for the job):
  
 <code> <code>
-pgm2txt english+pgm2txt -d -f en -v -s 10 movie_subtitle
 </code> </code>
  
 It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character...) It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character...)
  
-We will re-merge all these text files produced into a big subtitle file:+==== Make a single .srt file ==== 
 + 
 +Now we will re-merge all these text files produced into a big subtitle file:
  
 <code> <code>
-srttool -s -w < english.srtx > english.srt+srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt
 </code> </code>
  
Line 285: Line 323:
 You can now add english.srt onto the end of your ''ogmmerge'' command. Oh, and stick a ''-c LANGUAGE=English'' before it ;-) You can now add english.srt onto the end of your ''ogmmerge'' command. Oh, and stick a ''-c LANGUAGE=English'' before it ;-)
  
 +==== Fixing time, etc  ====
 +
 +Finally you can proof-check the final .srt file using the graphical interface of **Gaupol**, a full-featured subtitle editor program. It can handle some of the more common operation required:
 +
 +  * **Shift times**, from //Tools//, //Shift Positions...//
 +  * **Renumber subtitles**, this is done automatically when you save the project.
 ===== Links ===== ===== Links =====
  
doc/appunti/linux/video/ripping_dvds_with_mencoder.1507793446.txt.gz · Last modified: 2017/10/12 09:30 by niccolo