Table of Contents

Ripping DVDs with Mencoder

:!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see Ripping the content of a DVD.

Install the necessary programs

On Debian: (substitute k7 with 586 for Intel users)

apt-get install mplayer-k7 mencoder-k7 ogmtools libdvdcss dvdbackup

On Gentoo:

emerge mplayer ogmtools libdvdcss dvdbackup

On other distributions, use the appropriate package management tools to install mplayer, mencoder (which may be part of the mplayer package), ogmtools, libdvdcss and dvdbackup.

Rip & Unencrypt the DVD

Change to a directory on a disk with 10GiB+ free.

Backup the DVD with: (where MyDVD is the name of your project)

dvdbackup -i /dev/dvd -M -o MyDVD
cd MyDVD
ls

You should see one directory. I will call this directory $RIPDIR.

If you get key errors

If your DVD was from the wrong region and dvdbackup says it was unable to get the CSS keys, don't panic. Libdvdcss doesn't give a damn about regions (quite rightly), but it needs some help - you need to use ide-scsi. Unlike that Redmond OS, it also doesn't matter what region your drive is in (or RPC-I / RPC-II firmware), it just works :-)

Exception
Matsushita / Matshita / Panasonic (all synonyms) drives will mostly not work. You need to get a patched firmware from somewhere like http://www.rpc1.org/

Follow this procedure:

    • Compile your kernel with ide-scsi and when you boot up, pass the kernel the argument hdc=ide-scsi (where hdc is the name of your dvdrom device).
    • OR
    • If you have ide-scsi as a module, modprobe it (with the right options, which I don't know)
  1. Try again.
  2. If you now find your dvdrom is dead slow and your machine unuseable when you rip, you need to follow this procedure to enable DMA.
$ mkdir tmp-dev
$ cd tmp-dev
$ sudo MAKEDEV hdc                        # Where hdc is the name of your dvdrom
$ sudo hdparm -d1 hdc
$ cd ..
$ sudo rm -rf tmp-dev

Of course when you want to write DVDs, ide-scsi must be off. Life is tough :-)

Determine Encoding Parameters

Title

DVDs are made up of a number of titles. Generally, each video on the DVD is a title (i.e. main feature is title 1, behind the scenes documentary is title 2, etc.)

First we need to determine which title we want to rip. You can use xine, totem, ogle, etc. for this:

totem dvd://$RIPDIR

Navigate to the main feature and see what Title your player says it is. I will call it $TITLE

Cropping

The movie probably has lots of black space around it. We might as well get rid of it to save some file space (and a little screen space).

mplayer -dvd-device $RIPDIR dvd://$TITLE -vf cropdetect -ss 50:00

Let it play for a little, (until you reach a bit where you can see the edges of the picture) then quit. You will see output like:

crop area: X: 3..653  Y: 74..502  (-vf crop=640:416:10:80)1.2% 0 0 43%
crop area: X: 3..653  Y: 74..502  (-vf crop=640:416:10:80)1.2% 0 0 43%

Replace cropdetect with the crop command above and run mplayer again. It should have the picture perfectly cropped:

mplayer -dvd-device $RIPDIR dvd://$TITLE -vf crop=640:416:10:80 -ss 50:00

I will call that 640:416:10:80 bit $CROP.

Scaling

Scaling options:

  1. Don't rescale at all (will only play nicely in mplayer and other decent players if it isn't 4:3).
    • You will need to add :autoaspect to the -lavcopts string for mencoder.
    • For a high-quality rip, this is the option.
  2. Rescale to square pixels without resing.
    • To do so you must look for the line like VO: [xv] 720×576 ⇒ 1024×576 Planar YV12 in your mplayer output.
    • The second set of dimentions is the one you want to scale to.
    • I will say $SCALE=scale=1024:720. (This must go in the -vf)
  3. Rescale and resize.
    • I often resize to a vertical height of 368 (a multiple of 16). You can choose whatever multiple you want.
    • Then (either by calculation or trial and error using -vf scale -xy 650 and tweaking the 650), find the width
    • I will say $SCALE=scale=654:368. (This must go in the -vf)

Three-pass encode

There are serveral different ways to encode the video. The best quality is obtained by having three (main) separate passes:

  1. Extract Audio
    1. Encode Audio (this is a separate step if we are using OGG/Vorbis
  2. Examine Video to determine the compressability of each frame.
  3. Compress Video
  4. Merge audio and video (if it is an OGG or Matroska file.)

The advantages of a three pass encode are that we can get exactly the right file size (for, say, 2 CDs), and we can use containers besides AVI (which sucks big time compared to OGM and Matroska).

Extract frameno and audio

AVI

If you want an avi, encode your audio like this:

mencoder -dvd-device $RIPDIR dvd://$TITLE -ovc frameno -oac mp3lame -o frameno.avi

It will tell you some bitrates to use for various common rip-sizes based on the audio size.

OGG

For ogg, rip the audio: (you can tweak the ogg quality as necessary)

mplayer -dvd-device $RIPDIR dvd://$TITLE -vc dummy -vo null -hardframedrop -ao pcm:file=audio.wav
normalize-audio audio.wav
oggenc -q 2.5 audio.wav

Additional audio tracks can be ripped using mplayer's -aid option. Find the right id with -identify and some trial and error.

Extract chapter points (ogg only)

dvdxchap -t $TITLE $RIPDIR > chapters.txt

Encode video

Feel free to tweak bitrate (and other lavc options):

mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \
 -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=1 \
 -oac copy -o /dev/null
mencoder -dvd-device $RIPDIR dvd://$TITLE -vf crop=$CROP $SCALE \
 -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:autoaspect:vpass=2 \
 -oac copy -o video.avi

Remember that $SCALE might or might not be part of -vf (-vf options are comma seperated)

For a high-quality rip, I generally use a bitrate of 1500. If I'm rescaling down to a height of 384, I use 1000.

For really high-quality at the expense of encoding time, add :v4mv:mbd=2:trell to your -lavcopts.

If you don't want to preview your avi at this stage, you can replace -oac copy with -nosound. We will totally ignore the sound track in this avi file at the ogmmerge stage.

Merge OGM file

ogmmerge -o "Title.ogm" -c "LANGUAGE=English" audio.ogg chapters.txt -c "TITLE=Title" -A video.avi

For extra audio tracks, add in -c “LANGUAGE=English: Director Commentry” audio-c.ogg for example.

Two-pass encode

For a two pass encode, we are forced to end up with an AVI (or an MPG). Video quality remains the same as for three passes, though. It isn't much shorter, time-wise…:

  1. Examine Video
  2. Encode Video & Audio and mux into AVI.

For this, skip the Frameno and Merge OGM steps. Change the -oac option on your second video pass mencoder command from copy to mp3lame.

One-pass encode

For a one pass encode, we have the same restrictions as for two passes, but it takes about half the time (at the expense of video quality):

  1. Encode and mux into AVI.

Now, we skip the first pass of the video encode, and remove the vpass=2 option from the mencoder command. You must make the same change to -oac as for two-pass.

Extract Subtitles with transcode

FIXME The following programs are missing in Debian 10 Buster: tcextract, subtitle2vobsub and subtitle2pgm. We are searching for some alternatives.

DVDs have subtitles stored as images. There are some options for dealing with them:

NOTICE: The extract operation can be accomplished with mencoder, but mencoder seems to produce different image data into the .sub file and slightly different timestamps into the index (.idx) file depending on the used video codec (-ovc option): strange enough, I got different outputs using copy and raw options. Transcode instead seems to be more deterministic.

VobSub is a well known subtitle format that saves subtitles nearly in the same format as it appears in DVD subtitle streams. From a technical point of view, VobSub saves subtitles as little images.

Extracting the subtitles

Use mplayer to identify subtitle streams contained into the DVD, they are identified by an ID and a language:

mplayer -dvd-device $RIPDIR dvd://$TITLE -identify
...
ID_SUBTITLE_ID=0
ID_SID_0_LANG=it
ID_SUBTITLE_ID=1
ID_SID_1_LANG=en

The tccat command will concatenate all the files that compose the specified $TITLE to the standard output. Files are taken from the directory where the DVD-Video was ripped ($RIPDIR).

The tcextract command extract the requested stream; ps1 stands for MPEG private stream (subtitles), the source type (-t vob) must be specified when reading from standard input.

NOTICE: The number 0x21 is 0x20 + the subtitle ID.

tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1

If you have just the .VOB files, you can use this recipe:

cat VTS_02_?.VOB | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1

Use the How to rip DVD subtitles with vobsub2srt scripts to obtain the VobSub files:

subtitle2vobsub -p subtitles_stream.ps1 -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO -o subtitles

We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the VobSub format; two files will be generated: subtitles.idx and subtitles.sub.

If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the -e option, to indicate the start, the end and a new_start (new time offset) of the extraction, in seconds, like this:

subtitle2vobsub -p subtitles_stream.ps1 \
    -i $RIPDIR/VIDEO_TS/VTS_02_0.IFO \
    -e 9673.914,12673,0 -o subtitles

OCRing

Right, lets make our lives really nasty and create hundreds of PGM files:

cat subtitles_stream.ps1 | subtitle2pgm

If you want to control how grey levels are converted, try to use the -c option of subtitle2pgm, something like: -c 255,0,0,255.

Each subtitle should now be one file named like movie_subtitle0003.pgm, and a movie_subtitle.srtx file will be created to index them and their times on-screen.

With Tesseract OCR

#!/bin/sh
find . -type f -name '*.pgm' | sort | while read file; do
    echo -n "$(basename $file) "
    tesseract -l eng --psm 4 "$file" "$file"
done

With Gocr

NOTICE: Dont' use the following, because Gocr is not the best tool for OCR. Use Tesseract OCR instead.

To ocr all the .pgm image with gocr (using a nice wrapper for the job):

pgm2txt -d -f en -v -s 10 movie_subtitle

It will prompt you for tons of characters that it doesn't understand, and often totally bugger them up even when you give it the correct ones (it reads part of what it showed you again as another character…)

Make a single .srt file

Now we will re-merge all these text files produced into a big subtitle file:

srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt

Now it's time to proofread. I prefer to go through each one manually:

display *.pgm &
vim english.srt

You can use spacebar to advance your images in display.

Gocr is very predictable, so if it makes a mistake once, it will do it again, a lot! Use your editor's regular expression features whenever you spot a mistake to correct all the instances. It saves time.

Then spell check:

aspell -l british -c english.srt

You can now add english.srt onto the end of your ogmmerge command. Oh, and stick a -c LANGUAGE=English before it ;-)

Fixing time, etc

Finally you can proof-check the final .srt file using the graphical interface of Gaupol, a full-featured subtitle editor program. It can handle some of the more common operation required: