User Tools

Site Tools


doc:appunti:linux:video:ripping_dvds_with_mencoder

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
doc:appunti:linux:video:ripping_dvds_with_mencoder [2017/10/12 23:11] – [OCRing] niccolodoc:appunti:linux:video:ripping_dvds_with_mencoder [2020/04/21 17:05] (current) – [OCRing] niccolo
Line 1: Line 1:
 ====== Ripping DVDs with Mencoder ====== ====== Ripping DVDs with Mencoder ======
  
 +:!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see **[[vobcopy]]**.
 ===== Install the necessary programs ===== ===== Install the necessary programs =====
  
Line 200: Line 201:
  
 ===== Extract Subtitles with transcode ===== ===== Extract Subtitles with transcode =====
 +
 +FIXME The following programs are **missing in Debian 10 Buster**: **tcextract**, **subtitle2vobsub** and **subtitle2pgm**. We are searching for some alternatives.
  
 DVDs have subtitles stored as images. There are some options for dealing with them: DVDs have subtitles stored as images. There are some options for dealing with them:
Line 264: Line 267:
  
 <code> <code>
-cat subtitles_stream.ps1 | subtitle2pgm -c 255,0,0,255+cat subtitles_stream.ps1 | subtitle2pgm
 </code> </code>
 +
 +If you want to control how grey levels are converted, try to use the **%%-c%%** option of subtitle2pgm, something like: **%%-c 255,0,0,255%%**.
  
 Each subtitle should now be one file named like **movie_subtitle0003.pgm**, and a **movie_subtitle.srtx** file will be created to index them and their times on-screen. Each subtitle should now be one file named like **movie_subtitle0003.pgm**, and a **movie_subtitle.srtx** file will be created to index them and their times on-screen.
  
-Now to ocr all that with gocr (using a nice wrapper for the job):+=== With Tesseract OCR === 
 + 
 +<code bash> 
 +#!/bin/sh 
 +find . -type f -name '*.pgm' | sort | while read file; do 
 +    echo -n "$(basename $file) " 
 +    tesseract -l eng --psm 4 "$file" "$file" 
 +done 
 +</code> 
 + 
 +=== With Gocr === 
 + 
 +**NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead. 
 + 
 +To ocr all the .pgm image with **gocr** (using a nice wrapper for the job):
  
 <code> <code>
Line 277: Line 296:
 It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character...) It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character...)
  
-We will re-merge all these text files produced into a big subtitle file:+==== Make a single .srt file ==== 
 + 
 +Now we will re-merge all these text files produced into a big subtitle file:
  
 <code> <code>
-srttool -s -w < english.srtx > english.srt+srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt
 </code> </code>
  
Line 302: Line 323:
 You can now add english.srt onto the end of your ''ogmmerge'' command. Oh, and stick a ''-c LANGUAGE=English'' before it ;-) You can now add english.srt onto the end of your ''ogmmerge'' command. Oh, and stick a ''-c LANGUAGE=English'' before it ;-)
  
 +==== Fixing time, etc  ====
 +
 +Finally you can proof-check the final .srt file using the graphical interface of **Gaupol**, a full-featured subtitle editor program. It can handle some of the more common operation required:
 +
 +  * **Shift times**, from //Tools//, //Shift Positions...//
 +  * **Renumber subtitles**, this is done automatically when you save the project.
 ===== Links ===== ===== Links =====
  
doc/appunti/linux/video/ripping_dvds_with_mencoder.1507842698.txt.gz · Last modified: 2017/10/12 23:11 by niccolo