doc:appunti:linux:video:ripping_dvds_with_mencoder
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
doc:appunti:linux:video:ripping_dvds_with_mencoder [2017/10/12 09:30] – [Extracting the subtitles] niccolo | doc:appunti:linux:video:ripping_dvds_with_mencoder [2020/04/21 17:05] (current) – [OCRing] niccolo | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Ripping DVDs with Mencoder ====== | ====== Ripping DVDs with Mencoder ====== | ||
+ | :!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see **[[vobcopy]]**. | ||
===== Install the necessary programs ===== | ===== Install the necessary programs ===== | ||
Line 199: | Line 200: | ||
Now, we skip the first pass of the video encode, and remove the '' | Now, we skip the first pass of the video encode, and remove the '' | ||
- | ===== Subtitles ===== | + | ===== Extract |
+ | |||
+ | FIXME The following programs are **missing in Debian 10 Buster**: **tcextract**, | ||
DVDs have subtitles stored as images. There are some options for dealing with them: | DVDs have subtitles stored as images. There are some options for dealing with them: | ||
Line 228: | Line 231: | ||
</ | </ | ||
- | Now use transcode | + | The **tccat** command will concatenate all the files that compose the specified '' |
+ | |||
+ | The **tcextract** command | ||
+ | |||
+ | **NOTICE**: The number **0x21** is **0x20** + the subtitle ID. | ||
< | < | ||
- | tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x22 > subs-en | + | tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1 |
</ | </ | ||
- | where 0x22 is 0x20 + the subtitle ID. | + | If you have just the .VOB files, you can use this recipe: |
+ | |||
+ | < | ||
+ | cat VTS_02_? | ||
+ | </ | ||
- | If you want vobsub | + | Use the **[[subtitleripper]]** scripts to obtain the VobSub |
< | < | ||
- | subtitle2vobsub -o vobsubs-en | + | subtitle2vobsub -p subtitles_stream.ps1 |
</ | </ | ||
+ | We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the [[glossary# | ||
+ | |||
+ | If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the **-e** option, to indicate the **start**, the **end** and a **new_start** (new time offset) of the extraction, in **seconds**, | ||
+ | |||
+ | < | ||
+ | subtitle2vobsub -p subtitles_stream.ps1 \ | ||
+ | -i $RIPDIR/ | ||
+ | -e 9673.914, | ||
+ | </ | ||
==== OCRing ==== | ==== OCRing ==== | ||
Line 247: | Line 267: | ||
< | < | ||
- | subtitle2pgm | + | cat subtitles_stream.ps1 | subtitle2pgm |
</ | </ | ||
- | Each subtitle should now be one pgm file, and a srtx file will be created | + | If you want to control how grey levels are converted, try to use the **%%-c%%** option of subtitle2pgm, |
- | Now to ocr all that with gocr (using a nice wrapper for the job): | + | Each subtitle should now be one file named like **movie_subtitle0003.pgm**, |
+ | |||
+ | === With Tesseract OCR === | ||
+ | |||
+ | <code bash> | ||
+ | #!/bin/sh | ||
+ | find . -type f -name ' | ||
+ | echo -n " | ||
+ | tesseract -l eng --psm 4 " | ||
+ | done | ||
+ | </ | ||
+ | |||
+ | === With Gocr === | ||
+ | |||
+ | **NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead. | ||
+ | |||
+ | To ocr all the .pgm image with **gocr** (using a nice wrapper for the job): | ||
< | < | ||
- | pgm2txt | + | pgm2txt |
</ | </ | ||
It will prompt you for tons of characters that it doesn' | It will prompt you for tons of characters that it doesn' | ||
- | We will re-merge all these text files produced into a big subtitle file: | + | ==== Make a single .srt file ==== |
+ | |||
+ | Now we will re-merge all these text files produced into a big subtitle file: | ||
< | < | ||
- | srttool -s -w < english.srtx > english.srt | + | srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt |
</ | </ | ||
Line 285: | Line 323: | ||
You can now add english.srt onto the end of your '' | You can now add english.srt onto the end of your '' | ||
+ | ==== Fixing time, etc ==== | ||
+ | |||
+ | Finally you can proof-check the final .srt file using the graphical interface of **Gaupol**, a full-featured subtitle editor program. It can handle some of the more common operation required: | ||
+ | |||
+ | * **Shift times**, from //Tools//, //Shift Positions...// | ||
+ | * **Renumber subtitles**, | ||
===== Links ===== | ===== Links ===== | ||
doc/appunti/linux/video/ripping_dvds_with_mencoder.1507793446.txt.gz · Last modified: 2017/10/12 09:30 by niccolo