La codifica con <application>MEncoder</application> Produrre un rip di un film da DVD in un MPEG-4 ("DivX") di alta qualità Una domanda frequente è "Come posso generare il rip con la migliore quallità per una dimensione data?". Un'altra domanda è "Come posso fare il rip da DVD migliore in assoluto? Non mi interessa la dimensione del file, voglio solo la più alta qualità." L'ultima domanda è perlomeno forse posta malamente. Dopo tutto, se non ti interessa la dimensione del file, perché non ti copi semplicemente l'intero flusso video MPEG-2 dal DVD? Certo, avrai un AVI di 5GB, prendere o lasciare, ma se vuoi la miglior qualità e non ti importa della dimensione, è sicuramente la scelta migliore. Invero, la ragione per cui vuoi codificare un DVD in MPEG-4 è proprio perché ti interessa davvero la dimensione del file. E' difficile offrire una ricetta da libro su come generare un rip da DVD in qualità molto alta. Bisogna considerare vari fattori, e dovresti comprendere questi dettagli, altrimenti alla fine probabilmente sarai insoddisfatto del risultato. Più sotto evidenziamo alcuni di questi argomenti e poi passiamo ad esaminare un esempio. Partiamo dal principio che per codificare il video tu stia usando libavcodec anche se la teoria si applica allo stesso modo agli altri codec. Se questo ti sembra troppo, dovresti probabilmente usare una delle belle interfacce elencate nella sezione su MEncoder nella pagina dei progetti collegati (related projects). In tal modo riuscirai ad ottenere rip di alta qualità senza pensarci troppo, dato che la maggior parte di questi strumenti sono progettati per prendere decisioni sagge al tuo posto. Prepararsi alla codifica: identificare il materiale sorgente e la frequenza fotogrammi (framerate) Prima ancora di pensare a codificare un film, devi fare alcuni passi preliminari. Il primo e più importante passo prima della codifica dovrebbe essere determinare il tipo di contenuto che stai trattando. Se il tuo materiale di partenza arriva da un DVD o da TV in broadcast/via cavo/satellite, sarà salvato in uno dei due formati: NTSC per il Nord America e il Giappone, PAL per l'Europa, etc... E' importante tuttavia comprendere che questo è solo il formato per la trasmissione in televisione, e spesso non corrisponde al formato originario del film. L'esperienza insegna che il materiale NTSC è molto più difficile da codificare, perché ci sono più elementi da identificare nel sorgente. Per generare una codifica adeguata, devi sapere il formato originario. Il non tenerne conto porterà a molti __flaws__ nella tua codifica, inclusi artefatti orrendi __combing__ (interlacing) e fotogrammi duplicati o addirittura perduti. Oltre ad essere brutti, gli artefatti influenzano negativamente l'efficenza della codifica: otterrai una peggior qualità a parità di bitrate. Identificare la frequenza fotogrammi (framerate) del sorgente C'è qui un elenco di tipi comuni di materiale sorgente, dove facilmente si trovano e le loro proprietà: Film standard: prodotti per la visione su schermi da cinema a 24fps. Video PAL: registrati con una videocamera PAL a 50 campi al secondo. Un campo è composto dalle sole linee pari o dispari di un fotogramma. La televisione è stata progettata per aggiornarle alternativamente come un metodo economico di compressione analogica. L'occhio umano teoricamente compensa la cosa, ma una volta che capisci come funziona l'interlacciatura imparerai a vederla anche in TV e non ti piacerà più la TV. Due campi non fanno un fotogramma intero, poiché sono registrati a 1/50 di secondo di distanza nel tempo e quindi non si allineano a meno che non ci sia movimento alcuno. Video NTSC: registrati con una videocamera NTSC a 60000/1001 campi al secondi, o 60 campi al secondo nell'era precedente al colore. Per il resto sono simili ai PAL. Animazione: solitamente disegnati a 24fps, ma se ne trovano anche in tipologie con frequenza di fotogrammi mista. Computer Graphics (CG): possono essere con qualsiasi frequenza di fotogrammi, ma alcuni sono più comuni di altri; sono tipici 24 e 30 fotogrammi al secondo per NTSC e 25fps per PAL. Vecchi Film: varie e più basse frequenze di fotogrammi. Identificare il materiale sorgente I film composti da fotogrammi sono indicati come "progressivi", mentre quelli composti da campi indipendenti sono chiamati "interlacciati" o video - anche se quest'ultimo termine è ambiguo. Per complicare ulteriormente le cose, alcuni film possono essere un misto di molti dei suddetti. La più importante distinzione da farsi tra tutti questi formati è che alcuni sono basati su fotogrammi mentre gli altri sono basati su campi. Ogniqualvolta un film viene preparato per la visualizzazione in televisione (DVD inclusi), viene convertito in un formato basato su campi. I vari metodi con cui si può fare sono conosciuti nel loro insieme come "telecine", di cui il tristemente famoso "3:2 pulldown" NTSC è una tipologia. A meno che il materiale originale sia anch'esso basato su campi (e con la stessa frequenza di campi) otterrai un filmato in un formato diverso da quello che è in origine. Ci sono vari tipi usuali di "pulldown": Pulldown PAL 2:2: il più bello di tutti. Ciascun fotogramma viene mostrato per la durata di due campi, estraendo le linee pari e dispari e mostrandole alternativamente. Se il materiale di origine è a 24fps questo processo velocizza il filmato del 4%. Pulldown PAL 2:2:2:2:2:2:2:2:2:2:2:3: Ogni dodicesimo fotogramma viene mostrato per la durata di tre campi, invece che solamente per due. Questo evita il problema dell'aumento del 4% di velocità, ma rende il processo molto più difficile da __reversare__. Solitamente viene usato nelle produzioni musicali, dove modificare del 4% la velocità rovinerebbe pesantemente la colonna sonora. Telecine NTSC 3:2: i fotogrammi vengono mostrati alternativamente per la durata di 3 o 2 campi. Questo porta ad una frequenza di campi di 2.5 volte la frequenza orginaria. Il risultato viene anche leggermente rallentato da 60 campi al secondo fino a 60000/1001 campi al secondo, per mantenere la frequenza dei campi di NTSC. Pulldown NTSC 2:2: utilizzato per mostrare materiale a 30fps su NTSC. Carino, proprio come il pulldown PAL 2:2. Ci sono anche alcuni metodi per convertire tra video NTSC e PAL, ma gli arogmenti relativi non sono obiettivo di questa guida. Se ti trovi di fronte a un film di questo genere e lo vuoi codificare, la tua scelta migliore è cercarne una copia nel formato originale. La conversione tra questi due formati è altamente distruttiva e non può essere __reversed__ in maniera pulita, perciò la tua codifica __soffrirà__ molto se eseguita da una sorgente convertita. Quando il video viene salvato du un DVD, coppie consecutive di campi sono raggruppati in un fotogramma, anche se non sono pensati per esser mostrati nello stesso momento. Lo standard MPEG-2 usato sui DVD e per la TV digitale fornisce un modo sia per codificare i fotogrammi progressivi originali, che uno per memorizzare nell'intestazione del fotogramma il numero dei campi per cui il fotogramma stesso debba essere mostrato. Se viene usato questo metodo il filmato verrà spesso indicato come "soft telecine", visto che il procedimento indica semplicemente al lettore DVD di applicare il pulldown al film, invece che modificare il film stesso. Questa situazione è decisamente preferibile, dato che può essere facilmente __reversed__ (__actually ignored__) dal condificatore, e dato che mantiene la massima qualità. Tuttavia, molti studi di produzione DVD e di trasmissione non usano tecniche di codifica appropriate, ma al contrario producono filmati con "hard telecine", in cui i campi sono sotanzialmente duplicati nell'MPEG-2 codificato. Le modalità per gestire questi casi verranno descritte più avanti in questa guida. Per adesso ti lasciamo alcune indicazioni su come identificare il tipo di materiale che stai trattando: Regioni NTSC: Se MPlayer dice che la frequenza fotogrammi passa a 24000/1001 durante la visione del film e non ritorna come prima, è quasi sicuramente un qualche contenuto progressivo che è stato modificato in "soft telecine". Se MPlayer dice che la frequenza fotogrammi va avanti e indietro tra 24000/1001 e 30000/1001 e ogni tanto vedi delle "righe", allora ci sono varie possibilità. Le parti a 24000/1001 fps sono quasi certamente contenuto progressivo, in "soft telecine", ma le parti a 30000/1001 fps possono essere sia contenuto in "hard telecine" a 24000/1001 fps che video NTSC a 60000/1001 campi al secondo. Usa le stesse linee guida dei due casi seguenti per determinare quale. Se MPlayer non mostra mai una modifica alla frequenza dei fotogrammi e ogni singolo fotogramma con del movimento appare "rigato", il tuo filmato è video NTSC a 60000/1001 campi al secondo. Se MPlayer non mostra mai una modifica alla frequenza dei fotogrammi e due fotogrammi ogni cinque sono "rigati", il tuo film è contenuto a 24000/1001fps in "hard telecine". Regioni PAL: Se non vedi mai alcuna "riga", il tuo film è pulldown 2:2. Se vedi delle "righe" che vanno e vengono ogni mezzo secondo, allora il tuo film è pulldown 2:2:2:2:2:2:2:2:2:2:2:3. Se vedi sempre "righe" durante il movimento, allora il tuo film è video PAL a 50 campi al secondo. Consiglio: MPlayer può rallentare la riproduzione del film con l'opzione -speed o riprodurlo fotogramma per fotogramma. Prova ad usare 0.2 per guardare molto lentamente il film o premi ripetutamente il tasto "." per riprodurre un fotogramma per volta ed identificare la sequenza, se non riesci a vederla a velocità normale. Quantizzatore costante vs. multipassaggio E' possibile codificare il filmato in un'ampia gamma di qualità. Con i codificatori video moderni e un pelo di compressione pre-codec (ridimensionando e ripulendo), è possibile raggiungere una qualità molto buona in 700 MB, per un film di 90-110 minuti in widescreen. Inoltre tutti i film tranne i più lunghi possono essere codificati con una qualità pressoché perfetta in 1400 MB. Ci sono tre approcci per codificare il video: bitrate costante (CBR), quantizzatore costante, e multipassaggio (ABR, o bitrate medio). La complessità dei fotogrammi di un filmato, e di conseguenza il numero di bit necessari per comprimerli, può variare molto da una scena ad un'altra. I codificatori video moderni possono adattarsi via via a queste necessità e cambiare il bitrate. In modalità semplici come CBR, tuttavia, i codificatori non sanno il bitrate necessario alle scene venture e perciò non possono stare sopra al bitrate richiesto per lunghi periodi di tempo. Modalità più avanzate, come la codifica in multipassaggio, possono tener conto delle statistiche del passo precedente; questo corregge il problema suddetto. Nota: La maggior parte dei codec che gestisce la codifica in ABR può usare solo la codifica a due passaggi mentre altri come x264, Xvid e libavcodec gestiscono il multipassaggio, che migliora leggermente la qualità ad ogni passo, anche se tale moglioramento non è più misurabile né visibile veramente oltre il quarto passo o giù di lì. Perciò in questa sezione due passaggi e multipassaggio avranno lo stesso significato. In ambedue i modi, il codec video (come libavcodec) spezza il fotogramma video in macroblocchi da 16x16 pixel e poi applica un quantizzatore a ciascun macroblocco. Più basso è il quantizzatore, migliore sarà la qualità e più alto il bitrate. Il metodo usato dal codificatore del filmato per determinare quale quantizzatore utilizzare per un dato macroblocco varia ed è altamente configurabile. (Questa è una semplificazione estrema del vero processo, ma il concetto di base è comodo per capire.) Quando specifichi un bitrate constante, il codec video codificherà il video, scartando dettagli tanto quanto è necessario e il meno possibile, in modo da rimanere al di sotto del bitrate voluto. Se non ti interessa davvero la dimensione del file, potresti anche usare CBR e specificare un bitrate infinito. (In pratica, questo significa un valore abbastanza alto da non porre limiti, come 10000Kbit.) Con nessun limite sul bitrate, il risultato è che il codec userà il quantizzatore più basso possibile per ciascun macroblocco (come specificato da per libavcodec, che è 2 di default). Appena specifichi un bitrate abbastanza basso tale che il codec venga forzato ad utilizzare un quantizzatore più alto, allora stai sicuramente diminuendo la qualità del tuo video. Per evitarlo, dovresti probabilmente ridurre la dimensione del tuo video, seguendo il metodo descritto più avanti in questa guida. In generale dovresti evitare del tutto CBR se ti interessa la qualità. Con il quantizzatore costante, il codec utilizza lo stesso quantizzatore per ogni macroblocco, come specificato dall'opzione (per libavcodec). Se vuoi la più alta qualità possibile di rip, sempre ignorantdo il bitrate, puoi usare . Ciò porterà gli stessi bitrate e PSNR (peak signal-to-noise ratio) come CBR con =infinito e di default a 2. Il problema con la quantizzazione costante è che usa il quantizzatore indicato sia che il macroblocco ne abbia bisogno o no. Perciò è possibile che venga usato un quantizzatore più alto su un macroblocco senza sacrificare la qualità visiva. Perché sprecare i bit di un quantizzatore basso che non serve? La tua CPU ha tanti cicli fin quando c'è tempo, ma c'è solo un certo numero di bit sul tuo disco rigido. Con una codifica a due passi, il primo codificherà il filmato come se fosse CBR, ma manterrà una registrazione delle caratteristiche di ogni fotogramma. Questi dati sono poi utilizzati durante il secondo passo in modo da effettuare scelte intelligenti su quale quantizzatore usare. Durante le scene con azione veloce o molti dettagliate, verrano usati più probabilmente quantizzatori più alti, e durante scene lente o con pochi dettagli, verranno usati quantizzatori più bassi. Solitamente è molto più importante la quantità di movimento che la quantità di dettagli. Se usi , allora stai sprecando dei bit. Se usi , allora non stai ottenendo la miglior qualità. Supponi di rippare un DVD a e che il risultato sia 1800Kbit. Se fai una codifica a due passi con il video risultante avrà una qualità superiore a parità di bitrate. Dato che ora sei convinto che i due passaggi siano la strada da percorrere, la vera domanda adesso è quale bitrate usare? La risposta à che non c'è una risposta definitiva. Idealmente vuoi scegliere un bitrate che porti al miglior equilibrio tra qualità e dimensione del file. Tutto ciò varia in dipendenza del video di origine. Se la dimensione non è importante, un buon punto di partenza per un rip di qualità molto elevata è intorno a 2000Kbit più o meno 200Kbit. Per video con scene di azione veloce o con molti dettagli, oppure se semplicemente hai l'occhio critico, potresti scegliere 2400 o 2600. Per alcuni DVD potresti non notare alcuna differenza a 1400Kbit. Sperimentare con alcune scene a vari bitrate è una buona idea per farsi un'opinione. Se punti a una data dimensione, dovrai calcolare il bitrate in un qualche modo. Prima di farlo, però, devi sapere quanto spazio devi riservare per la traccia (le tracce) audio, per cui devi dapprima fare il rip di queste. Puoi calcolare il bitrate con l'equazione che segue: bitrate = (dimensione_voluta_in_Mbytes - dimensione_audio_in_Mbytes) * 1024 * 1024 / lunghezza_in_secondi * 8 / 1000 Per esempio, per far stare un film di due ore su un CD da 702MB, con 60MB di traccia audio, il bitrate video diventerà: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps Vincoli per una codifica efficiente A causa della natura del tipo di compressione MPEG, ci sono alcuni vincoli da seguire per avere la massima qualità. L'MPEG divide il video in quadrati da 16x16 chiamati macroblocchi, ciascuno di essi composto da blocchi 4x4 con informazioni sulla luminanza (intensità) e due blocchi da 8x8 a metà risoluzione per la crominanza (colore) (uno per l'asse rosso-ciano e l'altro per l'asse blu-giallo). Anche se la larghezza e l'altezza del tuo filmato non sono multipli di 16 il codificatore userà tanti macroblocchi 16x16 in modo da coprire tutta la superficie dell'immagine, e lo spazio in esubero sarà sprecato. Indi, per migliorare la qualità a una dimensione prefissata è una brutta idea utilizzare dimensioni che non siano multiple di 16. La maggior parte dei DVD ha anche alcune con bordi neri sui lati. Lasciarli lì avrà un'influenza molto negativa sulla qualità in svariati modi. Il tipo di compressione MPEG è pesantemente dipendente dalle trasformazioni di dominio frequenti, in particolare la "trasformazione discreta del coseno" (Discrete Cosine Transform (DCT)), che xxièe' simile alla trasformazione di Fourier. Quest'approccio di codifica è efficiente nella rappresentazione di motivi e transizioni delicate, ma trova difficoltà con spigoli più definiti. Per codificarli deve usare molti più bit oppure apparirà un artefatto conosciuto come 'ringing'. La trasformazione di frequenza (DCT) prende luogo separatemente in ogni macroblocco (praticamente in ogni blocco) perciò questo problema si applica solo quando lo spigolo definito è dentro a un blocco. Se il bordo nero inizia esattamente sul lato di un multiplo di 16, questo non e' un problema. Tuttavia i bordi neri sui DVD difficilmente sono ben allineati, perciò nella realtà dovrai sempre tagliarli via per evitare questi problemi. Oltre alle trasformazioni del dominio di frequenza, il tipo di compressione MPEG usa dei vettori di movimento per rappresetare le variazioni da un fotogramma al successivo. Naturalmente i vettori di movimento funzionano molto meno bene per i nuovi contenuti che arrivano dai bordi dell'immagine, dato che non erano presenti nel fotogramma precedente. Fintanto che l'immagine arriva fino al bordo dell'area codificata, i vettori di movimento non incontrano alcun problema con li contenuto che esce dall'immagine. Tuttavia ci possono esser problemi quando ci sono dei bordi neri: Per ogni macroblocco il tipo di compressione MPEG memorizza un vettore, che identifica quale parte del fotogramma precedente debba essere copiata nel macroblocco stesso, come base per predire il fotogramma successivo. Serve codificare solo le differenze restanti. Se un macroblocco oltrepassa il bordo dell'immagine e contiene parte del bordo nero, allora i vettori di movimento provenienti da altre zone dell'immagine ricopriranno il bordo nero. Questo significa che si devono utilizzare molti bit o per riannerire il bordo che è stato ricoperto, oppure (più verosimilmente) un vettore di movimento non sarà proprio usato e tutti i cambiamenti in questo macroblocco dovranno venir esplicitamente codificate. In un modo o nell'altro si ricuce di gran lunga l'efficienza della codifica. Inoltre questo problema si applica solo se i bordi neri non sono allinati su limiti di multipli di 16. Immagina infine di avere un macroblocco all'interno dell'immagine, ed un oggetto che passa da questo blocco verso il bordo dell'immagine. La codifica MPEG non può dire "copia la parte che è dentro all'immagine, ma non il bordo nero". Perciò anche il bordo nero vi verrà copiato all'interno, e molti bit saranno sprecati codificando l'immagine che si suppone stia lì. Se l'immagine arriva al limite della superficie codificata, l'MPEG ha una particolare ottimizzazione che consta nel copiare ripetutamente i pixel sul bordo dell'immagine quando un vettore di movimento arriva dall'esterno della superficie codificata. Questa funzionalità diventa inutile quando il film ha dei bordi neri. Diversamente dai problemi 1 e 2, allineare i bordi a multipli di 16 in questo caso non aiuta. A dispetto del fatto che i bordi siano completamente neri e non cambino mai, c'è perlomeno un piccolo spreco nell'avere più macroblocchi. Per tutte queste ragioni si consiglia di tagliar via completamente i bordi neri. Inoltre, se c'è una zona di rumore/distorsione sui bordi dell'immagine, tagliarla migliorerà ancora l'efficienza di codifica. I puristi videofili che vogliono mantenere il più possibile l'originale potrebbero obiettare su questo taglio, ma a meno di non codificare a una quantizzazione costante, la qualità guadagnata tagliando sorpasserà di gran lunga la quantità di informazioni perse sui bordi. Tagliare e Ridimensionare Ricorda dalla sezione precedente che la dimensione finale dell'immagine che codifichi dovrebbe essere un multiplo di 16 (sia in larghezza che altezza). Si può ottenere ciò tagliando, ridimensionando o combinando le due cose. Quando tagli, ci sono alcune linee guida che si devono seguire per evitare di rovinare il tuo filmato. Il formato YUV abituale, 4:2:0, memorizza le informazioni sulla crominanza (colore) sottocampionate, per es. la crominanza viene campionata in ogni direzione solo la metà di quanto venga la luminanza (intensità). Osserva questo diagramma, dove L indica i punti di campionamente della luminanza e C quelli della crominanza. L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L Come puoi vedere, le righe e le colonne dell'immagine vengono sempre a coppie. Quindi i tuoi valori di spostamento e dimensione devono essere numeri pari. Se non lo sono la crominanza non sarà più allineata correttamente con la luminanza. In teoria è possibile tagliare con uno spostamento dispari, ma richiede che la crominanza venga ricampionata, il che potenzialmente è un'operazione in perdita e non è gestita dal filtro crop. Inoltre, il video interlacciato viene campionato come segue: Campo superiore Campo inferiore L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L Come puoi notare, il motivo non si ripete fino a dopo 4 linee. Quindi per il video interlacciato, il tuo spostamento sull'asse y e l'altezza devono essere multipli di 4. La risoluzione nativa DVD è 720x480 per NTSC e 720x576 per PAL, ma c'è un flag per l'aspetto che indica se è full-screen (4:3) o wide-screen (16:9). Molti (se non quasi tutti) i DVD in widescreen non sono esattamente 16:9 e possono essere sia 1.85:1 o 2.35:1 (cinescope). Questo significa che nel video ci saranno bordi neri che bisogna tagliare via. MPlayer fornisce un filtro che rileva i valori di taglio e fornisce il rettangolo per crop (). Esegui MPlayer con ed emetterà le impostazioni di taglio per crop al fine di rimuovere i bordi. Dovresti lasciare andare avanti il film abbastanza da ottenere valori di taglio precisi. Dopodiché prova con MPlayer i valori ottenuti usando la linea comando emessa da , e correggi il rettangolo se e come serve. Il filtro può esserti di aiuto, dato che ti permette di impostare interattivamente la posizione del rettangolo di taglio sopra al filmato. Ricordati di seguire le linee guida sui multipli in modo da non disallineare i piani di crominanza. In talune occasioni, il ridimensionamento può essere indesiderabile. Il ridimensionamento sulla direzione verticale è difficoltoso con video interlacciato e se vuoi mantenere l'interlacciamento, dovresti evitare il ridimensionamento. Se non ridimensionerai, ma vuoi comunque usare dimensioni multiple di 16, dovrai tagliare di più. Evita di tagliare di meno, dato che i bordi neri sono un male per la codifica! Dato che MPEG-4 usa macroblocchi 16x16 vorrai esser sicuro che ambedue le dimensioni del video che stai per codificare siano multiple di 16, altrimenti perderai in qualità, soprattutto a bitrate più bassi. Puoi farlo abbassando la larghezza e l'altezza del rettangolo di taglio al multiplo di 16 più vicino. Come detto precedentemente, quando tagli, vorrai aumentare lo scostamento Y della metà della differenza tra la nuova e la vecchia altezza, in modo che il video risultante sia preso dal centro del fotogramma. Inoltre, a causa del modo in cui il video DVD viene campionato, assicurati che lo scostamento sia un numero pari. (Infatti, come regola, non utilizzare mai valori dispari per alcun parametro quando tagli e ridimensioni un video.) Se non ti va di scartare dei pixel in più, potresti piuttosto preferire il ridimensionamento del video. Prenderemo in esame questa situazione più avanti. Puoi in verità lasciare che tutte le considerazioni suddette vengano fatte dal filtro , visto che ha un parametro facoltativo, che è impostato a 16 di default. Fai anche attenzione ai pixel "mezzi neri" sui bordi. Assicurati di tagliare anch'essi, altrimenti sprecherai bit più utili altrove. Dopo aver detto e fatto tutto ciò, probabilmente avrei un vide i cui pixel non saranno proprio 1.85:1 o 2.35:1, ma piuttosto un valore vicino. Potresti calcolare a mano il nuovo rapporto di aspetto, ma MEncoder ha un'opzione per libavcodec chiamata che lo farà per te. Non aumentare assolutamente le dimensioni del video per avere i pixel quadrati, a meno che tu non voglia sprecare il tuo spazio disco. Il ridimensionamento dovrebbe essere eseguito in riproduzione, e per definire la risoluzione giusta il riproduttore userà l'aspetto memorizzato nell'AVI. Sfortunatamente non tutti i riproduttori verificano l'informazione sul rapporto perciò potresti voler comunque effettuare il ridimensionamento. Scegliere la risoluzione e il bitrate Other parameters such as scaling, cropping, etc. will not alter the file size unless you change the bitrate as well!. A meno che tu non stia per codificare con quantizzazione costante devi impostare un bitrate. La logica del bitrate è abbastanza semplice. Normalmente il bitrate viene misurato in kilobit (1000 bit) al secondo. La dimensione del filmato sul disco è il bitrate moltiplicato per la durata del filmato, più un piccolo quantitativo in "surplus" (vedi per esempio la sezione sul contenitore AVI). Altri parametri come ridimensionamento, taglio, etc... non influiscono sulla dimensione del file a meno che tu non cambi anche il bitrate! Il bitrate non è direttamente proporzionale alla risoluzione. Tanto per capirci, un file 320x240 a 200 kbit/sec non avrà la stessa qualità dello stesso filmato a 640x480 e 800 kbit/sec! Ci sono due ragioni per ciò: Percettiva: noti di più gli artefatti MPEG quando sono più grandi! Gli artefatti appaiono a livello dei blocchi (8x8). Il tuo occhio non noterà errori in 4800 piccoli blocchi tanti quanti ne vedrà in 1200 grossi blocchi (assumendo che tu li stia ridimensionando tutti e due a schermo intero). Teorica : quando rimpicciolisci un immagine ma usi la stessa dimensione dei blocchi (8x8) per la trasformazione spaziale della frequenza, hai più dati nelle bande ad alta frequenza. In parole povere, ogni pixel contiene più dettagli di quanti ne contenesse prima. Quindi anche se la tua immagine rimpicciolita contiene 1/4 delle informazioni sulle direzioni spaziali, potrebbe ancora contenere una gran parte delle informazioni nel dominio delal frequenza (assumendo che le alte frequenze siano sotto-utilizzate nell'immagine di origine a 640x480). Guide precendenti hanno consigliato di scegliere un bitrate e una risoluzione in base ad un approccio "bit al secondo", ma di solito ciò non è valido a causa delle ragioni suddette. Una stima migliore pare essere che il bitrate è proporzionale alla radice quadrata della risoluzione, per cui 320x240 e 400 kbit/sec sarà paragonabile a 640x480 a 800 kbit/sec. Tuttavia ciò non è stato verificato con certezza empirica o teorica. Inoltre, dato che i filmati hanno diversi livelli di disturbo, dettaglio, angoli di movimento, etc..., è vano dare consigli generici su bit per lunghezza della diagonale (analogamente a bit per pixel, usando la radice quadrata). Finora abbiamo parlato della difficoltà nel scegliere un bitrate e una risoluzione. Calcolare la risoluzione I passaggi seguenti ti guideranno nel calcolo della risoluzione per la tua codifica senza distorcere troppo il video, tenendo in considerazione vari tipo di informazioni riguardo la sorgente video. Per prima cosa dovresti calcolare il rapporto di aspetto codificato: ARc = (Wc x (ARa / PRdvd )) / Hc dove: Wc e Hc sono la larghezza e l'altezza del video tagliato, ARa è il rapporto di aspetto mostrato, che di solito è 4/3 o 16/9, PRdvd à il rapporto del pixel del DVD che è uguale a 1.25=(720/576) per DVD PAL e 1.5=(720/480) per DVD NTSC. Dopo puoi calcolare la risoluzione X e Y, basandoti su un dato fattore di qualità di compressione (Compression Quality, CQ): ResY = INT(SQRT( 1000*Bitrate/25/ARc/CQ )/16) * 16 and ResX = INT( ResY * ARc / 16) * 16 Okay, ma cos'è la CQ? However, if you have a target size for your movie (1 or 2 CDs for instance), there is a limited total number of bits that you can spend; therefore it is necessary to find a good tradeoff between compressibility and quality. Il CQ rappresenta il numero di bit per pixel e per fotogramma in codifica. Parlando più semplicemente, più alto è la CQ, più difficilmente si vedranno codificati degli artefatti. La CQ dipende dal bitrate, dall'efficienza del codec video e dalla risoluzione del filmato. Per alzare la CQ, di solito dovrai rimpicciolire il filmato visto che il bitrate viene calcolato in funzione della dimensione voluta e della lunghezza del filmato, che sono delle costanti. Con codec MPEG-4 ASP come Xvid e libavcodec, una CQ inferiore a 0.18 solitamente genera un'immagine abbastanza squadrettata, perché non ci sono abbastanza bit per codificare l'informazione di ogni macroblocco. (MPEG4, come molti altri codec, ragruppa i pixel in blocchi di pixel per comprimere l'immagine; se non ci sono abbastanza bit, si vedono i bordi dei blocchi.) E' saggio anche prendere una CQ compresa tra 0.20 e 0.22 per un rip a 1 CD, e 0.26-0.28 per un rip a 2 CD con impostazioni standard di codifica. Opzioni più evolute di codifica come quelle qui indicate per libavcodec e Xvid dovrebbero permetterti di ottenere la stessa qualità con CQ compresa tra 0.18 e 0.20 per un rip da 1 CD, e da 0.24 a 0.26 per 2 CD. Con codec MPEG-4 AVC come x264, puoi usare una CQ che varia da 0.14 a 0.16 con opzioni standard di codifica, e dovresti riuscire a scendere tra 0.10 e 0.12 con impostazioni avanzate di codifica x264. Prendi per favore nota che CQ è solo un valore indicativo, dato che dipende dal contenuto che viene codificato, una CQ di 0.18 può andar bene per un Bergman, mentre per un film come Matrix, che contiene molte scene ad alta velocità, no. D'altro canto è inutile portare la CQ oltre 0.30 dato che sprecherai dei bit senza avere alcun guadagno visibile in qualità. Nota anche che come detto precedentemente in questa guida, per video a bassa risoluzione serve una CQ più alta (in rapporto, per esempio, alla risoluzione DVD) perché si vedano bene. Filtraggio Imparare come usare i filtri video di MEncoder è essenziale per produrre delle buone codfiche. Tutta l'elaborazione video è eseguita attraverso i filtri -- taglio, ridimensionamento, aggiustamento del colore, rimozione del disturbo, rilevamento margini, deinterlacciatura, telecine, telecine inverso, e deblocco, solo per nominarne qualcuno. Insieme con la vasta gamma di formati di entrata gestiti, la varietà dei filtri disponibili in MEncoder è uno dei suoi più grandi vantaggi sugli altri programmi similari. I filtri vengono caricati in catena usando l'opzione -vf: -vf filtro1=opzioni,filtro2=opzioni,... La maggior parte dei filtri riceve alcune opzioni numeriche separate da due punti, ma la sintassi per le opzioni cambia da filtro a filtro, indi leggiti la pagina man per i dettagli sul filtro che desideri usare. I filtri lavorano sul video nell'ordine in cui vengono caricati. Per esempio la catena seguente: -vf crop=688:464:12:4,scale=640:464 dapprima taglia la zona 688x464 dell'immagine con uno scostamento dall'alto a sinistra di (12,4), e poi ridimensiona il risultato a 640x464. Taluni filtri devono essere caricati all'inizio o vicino all'inizio della catena di filtri, in modo da trarre vantaggio dalle informazioni che arrivano dal decoder video, che potrebbero essere perse o invalidate da altri filtri. Gli esempi principali sono (post elaborazione (postprocessing), solo quando esegue operazioni di deblock o dering), (un altra post elaborazione per eliminare artefatti MPEG), (telecine inverso), e (per passare da telecine soft a hard). In generale vorrai filtrare il meno possibile in modo da rimaner fedele alla sorgente DVD originale. Il taglio è spesso necessario (com detto sopra), ma evita di ridimensionare il video. Anche se alcune volte si preferisce rimpicciolire per poter usare quantizzatori più alti, vogliamo evitare ciò: ricorda che abbiamo sin dall'inizio deciso di investire bit in qualità. In più, non reimpostare la gamma, il contrasto, la luminosità, etc... Quello che si vede bene sul tuo schermo potrebbe non vedersi bene su altro. Queste modifiche dovrebbero esser fatte solo durante la riproduzione. Una cosa che voresti però fare è tuttavia far passare il video attraverso un leggero filtro di rimozione disturbo, come . Ancora, è una questione di poter meglio utilizzare quei bit: perché sprecarli codificando disturbo mentre puoi semplicemente aggiungerlo di nuovo durante la riproduzione? Alzando i parametri per aumenterà ancora la compressione, ma se aumenti troppo i valori, rischi un degrado pesante dell'immagine. I valori sopra consigliati () sono abbastanza conservativi; sentiti libero di sperimentare con valori più alti e verificare da solo il risultato. Interlacciamento e Telecine Quasi tutti i film vengono ripresi a 24 fps. Dato che NTSC è 30000/1001 fps, si devono eseguire alcune elaborazioni affinché questo video a 24 fps sia letto al giusto framerate NTSC. Il processo è chiamato "3:2 pulldown", meglio conosciuto come "telecine" (poiché pulldown viene spesso applicato durante il processo di telecine), e descritto rozzamente, agisce rallentando il film a 24000/1001 fps, e ripetendo ogni quarto fotogramma. Non viene invece eseguita alcuna elaborazione sul video per i DVD PAL, che girano a 25 fps. (Tecnicamente, PAL può subire il telecine, chiamato "2:2 pulldown", ma non è usanza abituale.) Il film a 24 fps viene semplicemente riprodotto a 25 fps. Il risultato è che il filmato è leggermente più veloce, ma a meno che tu non sia un alieno, probabilmente non noterai la differenza. La maggior parte dei DVD PAL hanno audio corretto ai picchi, in modo che quando siano riprodotti a 25 fps le cose suonino giuste, anche se la traccia audio (e quindi tutto il filmato) ha un tempo di riproduzione che è il 4% inferiore ai DVD NTSC. A causa del fatto che il video nei DVD PAL non è stato alterato, non dovrai preoccuperti molto della frequenza fotogrammi. La sorgente è 25 fps, e il tuo rip sarà a 25 fps. Tuttavia, se stai codificando un film da DVD NTSC, potresti dover applicare il telecine inverso. Per film ripresi a 24 fps, il video sul DVD NTSC è o con telecine a 30000/1001, oppure è progressivo a 24000/1001 fps e destinato a subire il telecine al volo da un lettore DVD. D'altro canto le serie TV sono solitamente solo interlacciate, senza telecine. Questa non è una regola ferrea: alcune serie TV sono interlacciate (come Buffy the Vampire Slayer) mentre alcune sono un misto di progressivo e interlacciato (come Angel, o 24). Si consiglia vivamente di leggere la sezione su Come trattare il telecine e l'interlacciamento nei DVD NTSC per imparare come gestire le varie possibilità. Ciononostante, se stai principalmente rippando solo film, solitamente ti troverai di fronte a video a 24 fps progressivo o con telecine, nel qual caso puoi usare il filtro . Codificare video interlacciato Se il film che vuoi codificare è interlacciato (video NTSC o PAL) dovrai scegliere se vuoi de-interlacciare o no. Se da un lato de-interlacciare renderà il tuo filmato utilizzabile su schermi a scansione progressiva come monitor di computer o proiettori, porta con sé un costo: la frequenza dei campi di 50 o 60000/1001 campi al secondo viene dimezzata a 25 o 30000/1001 fotogrammi al secondo, e circa la metà delle informazioni nel tuo film saranno perdute, in scene con movimento significativo. Per di più, se stai codificando puntando ad alta qualità di archiviazione. si consiglia di non de-interlacciare. Puoi sempre de-interlacciare il film durante la riproduzione attraverso dispositivi a scansione progressiva. La potenza dei computer attuali forza per i riproduttori l'utilizzo di un filtro di de-interlacciamento, che porta un leggero degrado dell'immagine. Ma i lettori del futuro saranno in grado di simulare lo schermo di una TV, de-interlacciando a piena frequenza di campi e interpolando 50 o 60000/1001 fotogrammi interi al secondo dal video interlacciato Bisogna porre speciale attenzione quando si lavora con video interlacciato: Altezza e scostamento del taglio devono essere multipli di 4. Qualsiasi ridimensionamento verticale va fatto in modalità interlacciata. I filtri di post elaborazione e di rimozione disturbo potrebbero non funzionare come ci si aspetta a meno che tu non ponga particolare attenzione per farli lavorare su un campo per volta, e possono rovinare il video quando usati in modo non corretto. Tenendo a mente queste cose, ecco il nostro primo esempio: mencoder capture.avi -mc 0 -oac lavc -ovc lavc -lavcopts \ vcodec=mpeg2video:vbitrate=6000:ilme:ildct:acodec=mp2:abitrate=224 Nota le opzioni e . Notes on Audio/Video synchronization MEncoder's audio/video synchronization algorithms were designed with the intention of recovering files with broken sync. However, in some cases they can cause unnecessary skipping and duplication of frames, and possibly slight A/V desync, when used with proper input (of course, A/V sync issues apply only if you process or copy the audio track while transcoding the video, which is strongly encouraged). Therefore, you may have to switch to basic A/V sync with the option, or put this in your ~/.mplayer/mencoder config file, as long as you are only working with good sources (DVD, TV capture, high quality MPEG-4 rips, etc) and not broken ASF/RM/MOV files. If you want to further guard against strange frame skips and duplication, you can use both and . This will prevent all A/V sync, and copy frames one-to-one, so you cannot use it if you will be using any filters that unpredictably add or drop frames, or if your input file has variable framerate! Therefore, using is not in general recommended. The so-called "three-pass" audio encoding which MEncoder supports has been reported to cause A/V desync. This will definitely happen if it is used in conjunction with certain filters, therefore, it is now recommended not to use three-pass audio mode. This feature is only left for compatibility purposes and for expert users who understand when it is safe to use and when it is not. If you have never heard of three-pass mode before, forget that we even mentioned it! There have also been reports of A/V desync when encoding from stdin with MEncoder. Do not do this! Always use a file or CD/DVD/etc device as input. Choosing the video codec Which video codec is best to choose depends on several factors, like size, quality, streamability, usability and popularity, some of which widely depend on personal taste and technical constraints. Compression efficiency: It is quite easy to understand that most newer-generation codecs are made to increase quality and compression. Therefore, the authors of this guide and many other people suggest that you cannot go wrong Be careful, however: Decoding DVD-resolution MPEG-4 AVC videos requires a fast machine (i.e. a Pentium 4 over 1.5GHz or a Pentium M over 1GHz). when choosing MPEG-4 AVC codecs like x264 instead of MPEG-4 ASP codecs such as libavcodec MPEG-4 or Xvid. (Advanced codec developers may be interested in reading Michael Niedermayer's opinion on "why MPEG4-ASP sucks".) Likewise, you should get better quality using MPEG-4 ASP than you would with MPEG-2 codecs. However, newer codecs which are in heavy development can suffer from bugs which have not yet been noticed and which can ruin an encode. This is simply the tradeoff for using bleeding-edge technology. What is more, beginning to use a new codec requires that you spend some time becoming familiar with its options, so that you know what to adjust to achieve a desired picture quality. Hardware compatibility: It usually takes a long time for standalone video players to begin to include support for the latest video codecs. As a result, most only support MPEG-1 (like VCD, XVCD and KVCD), MPEG-2 (like DVD, SVCD and KVCD) and MPEG-4 ASP (like DivX, libavcodec's LMP4 and Xvid) (Beware: Usually, not all MPEG-4 ASP features are supported). Please refer to the technical specs of your player (if they are available), or google around for more information. Best quality per encoding time: Codecs that have been around for some time (such as libavcodec MPEG-4 and Xvid) are usually heavily optimized with all kinds of smart algorithms and SIMD assembly code. That is why they tend to yield the best quality per encoding time ratio. However, they may have some very advanced options that, if enabled, will make the encode really slow for marginal gains. If you are after blazing speed you should stick around the default settings of the video codec (although you should still try the other options which are mentioned in other sections of this guide). You may also consider choosing a codec which can do multi-threaded processing, though this is only useful for users of machines with several CPUs. libavcodec MPEG-4 does allow that, but speed gains are limited, and there is a slight negative effect on picture quality. Xvid's multi-threaded encoding, activated by the option, can be used to boost encoding speed — by about 40-60% in typical cases — with little if any picture degradation. x264 also allows multi-threaded encoding, which currently speeds up encoding by 94% per CPU core while lowering PSNR between 0.005dB and 0.01dB on a typical setup. Personal taste: This is where it gets almost irrational: For the same reason that some hung on to DivX 3 for years when newer codecs were already doing wonders, some folks will prefer Xvid or libavcodec MPEG-4 over x264. You should make your own judgement; do not take advice from people who swear by one codec. Take a few sample clips from raw sources and compare different encoding options and codecs to find one that suits you best. The best codec is the one you master, and the one that looks best to your eyes on your display The same encode may not look the same on someone else's monitor or when played back by a different decoder, so future-proof your encodes by playing them back on different setups. ! Please refer to the section selecting codecs and container formats to get a list of supported codecs. Audio Audio is a much simpler problem to solve: if you care about quality, just leave it as is. Even AC-3 5.1 streams are at most 448Kbit/s, and they are worth every bit. You might be tempted to transcode the audio to high quality Vorbis, but just because you do not have an A/V receiver for AC-3 pass-through today does not mean you will not have one tomorrow. Future-proof your DVD rips by preserving the AC-3 stream. You can keep the AC-3 stream either by copying it directly into the video stream during the encoding. You can also extract the AC-3 stream in order to mux it into containers such as NUT or Matroska. mplayer source_file.vob -aid 129 -dumpaudio -dumpfile sound.ac3 will dump into the file sound.ac3 the audio track number 129 from the file source_file.vob (NB: DVD VOB files usually use a different audio numbering, which means that the VOB audio track 129 is the 2nd audio track of the file). But sometimes you truly have no choice but to further compress the sound so that more bits can be spent on the video. Most people choose to compress audio with either MP3 or Vorbis audio codecs. While the latter is a very space-efficient codec, MP3 is better supported by hardware players, although this trend is changing. Do not use when encoding a file with audio, even if you will be encoding and muxing audio separately later. Though it may work in ideal cases, using is likely to hide some problems in your encoding command line setting. In other words, having a soundtrack during your encode assures you that, provided you do not see messages such as Too many audio packets in the buffer, you will be able to get proper sync. You need to have MEncoder process the sound. You can for example copy the orignal soundtrack during the encode with or convert it to a "light" 4 kHz mono WAV PCM with . Otherwise, in some cases, it will generate a video file that will not sync with the audio. Such cases are when the number of video frames in the source file does not match up to the total length of audio frames or whenever there are discontinuities/splices where there are missing or extra audio frames. The correct way to handle this kind of problem is to insert silence or cut audio at these points. However MPlayer cannot do that, so if you demux the AC-3 audio and encode it with a separate app (or dump it to PCM with MPlayer), the splices will be left incorrect and the only way to correct them is to drop/dup video frames at the splice. As long as MEncoder sees the audio when it is encoding the video, it can do this dropping/duping (which is usually OK since it takes place at full black/scenechange), but if MEncoder cannot see the audio, it will just process all frames as-is and they will not fit the final audio stream when you for example merge your audio and video track into a Matroska file. First of all, you will have to convert the DVD sound into a WAV file that the audio codec can use as input. For example: mplayer source_file.vob -ao pcm:file=destination_sound.wav \ -vc dummy -aid 1 -vo null will dump the second audio track from the file source_file.vob into the file destination_sound.wav. You may want to normalize the sound before encoding, as DVD audio tracks are commonly recorded at low volumes. You can use the tool normalize for instance, which is available in most distributions. If you are using Windows, a tool such as BeSweet can do the same job. You will compress in either Vorbis or MP3. For example: oggenc -q1 destination_sound.wav will encode destination_sound.wav with the encoding quality 1, which is roughly equivalent to 80Kb/s, and is the minimum quality at which you should encode if you care about quality. Please note that MEncoder currently cannot mux Vorbis audio tracks into the output file because it only supports AVI and MPEG containers as an output, each of which may lead to audio/video playback synchronization problems with some players when the AVI file contain VBR audio streams such as Vorbis. Do not worry, this document will show you how you can do that with third party programs. Muxing Now that you have encoded your video, you will most likely want to mux it with one or more audio tracks into a movie container, such as AVI, MPEG, Matroska or NUT. MEncoder is currently only able to natively output audio and video into MPEG and AVI container formats. for example: mencoder -oac copy -ovc copy -o output_movie.avi \ -audiofile input_audio.mp2 input_video.avi This would merge the video file input_video.avi and the audio file input_audio.mp2 into the AVI file output_movie.avi. This command works with MPEG-1 layer I, II and III (more commonly known as MP3) audio, WAV and a few other audio formats too. MEncoder features experimental support for libavformat, which is a library from the FFmpeg project that supports muxing and demuxing a variety of containers. For example: mencoder -oac copy -ovc copy -o output_movie.asf -audiofile input_audio.mp2 \ input_video.avi -of lavf -lavfopts format=asf This will do the same thing as the previous example, except that the output container will be ASF. Please note that this support is highly experimental (but getting better every day), and will only work if you compiled MPlayer with the support for libavformat enabled (which means that a pre-packaged binary version will not work in most cases). Improving muxing and A/V sync reliability You may experience some serious A/V sync problems while trying to mux your video and some audio tracks, where no matter how you adjust the audio delay, you will never get proper sync. That may happen when you use some video filters that will drop or duplicate some frames, like the inverse telecine filters. It is strongly encouraged to append the video filter at the end of the filter chain to avoid this kind of problem. Without , if MEncoder wants to duplicate a frame, it relies on the muxer to put a mark on the container so that the last frame will be displayed again to maintain sync while writing no actual frame. With , MEncoder will instead just push the last frame displayed again into the filter chain. This means that the encoder receives the exact same frame twice, and compresses it. This will result in a slightly bigger file, but will not cause problems when demuxing or remuxing into other container formats. You may also have no choice but to use with container formats that are not too tightly linked with MEncoder such as the ones supported through libavformat, which may not support frame duplication at the container level. Limitations of the AVI container Although it is the most widely-supported container format after MPEG-1, AVI also has some major drawbacks. Perhaps the most obvious is the overhead. For each chunk of the AVI file, 24 bytes are wasted on headers and index. This translates into a little over 5 MB per hour, or 1-2.5% overhead for a 700 MB movie. This may not seem like much, but it could mean the difference between being able to use 700 kbit/sec video or 714 kbit/sec, and every bit of quality counts. In addition this gross inefficiency, AVI also has the following major limitations: Only fixed-fps content can be stored. This is particularly limiting if the original material you want to encode is mixed content, for example a mix of NTSC video and film material. Actually there are hacks that can be used to store mixed-framerate content in AVI, but they increase the (already huge) overhead fivefold or more and so are not practical. Audio in AVI files must be either constant-bitrate (CBR) or constant-framesize (i.e. all frames decode to the same number of samples). Unfortunately, the most efficient codec, Vorbis, does not meet either of these requirements. Therefore, if you plan to store your movie in AVI, you will have to use a less efficient codec such as MP3 or AC-3. Having said all that, MEncoder does not currently support variable-fps output or Vorbis encoding. Therefore, you may not see these as limitations if MEncoder is the only tool you will be using to produce your encodes. However, it is possible to use MEncoder only for video encoding, and then use external tools to encode audio and mux it into another container format. Muxing into the Matroska container Matroska is a free, open standard container format, aiming to offer a lot of advanced features, which older containers like AVI cannot handle. For example, Matroska supports variable bitrate audio content (VBR), variable framerates (VFR), chapters, file attachments, error detection code (EDC) and modern A/V Codecs like "Advanced Audio Coding" (AAC), "Vorbis" or "MPEG-4 AVC" (H.264), next to nothing handled by AVI. The tools required to create Matroska files are collectively called mkvtoolnix, and are available for most Unix platforms as well as Windows. Because Matroska is an open standard you may find other tools that suit you better, but since mkvtoolnix is the most common, and is supported by the Matroska team itself, we will only cover its usage. Probably the easiest way to get started with Matroska is to use MMG, the graphical frontend shipped with mkvtoolnix, and follow the guide to mkvmerge GUI (mmg) You may also mux audio and video files using the command line: mkvmerge -o output.mkv input_video.avi input_audio1.mp3 input_audio2.ac3 This would merge the video file input_video.avi and the two audio files input_audio1.mp3 and input_audio2.ac3 into the Matroska file output.mkv. Matroska, as mentioned earlier, is able to do much more than that, like multiple audio tracks (including fine-tuning of audio/video synchronization), chapters, subtitles, splitting, etc... Please refer to the documentation of those applications for more details. How to deal with telecine and interlacing within NTSC DVDs Introduction What is telecine? If you do not understand much of what is written in this document, read the Wikipedia entry on telecine. It is an understandable and reasonably comprehensive description of what telecine is. A note about the numbers. Many documents, including the guide linked above, refer to the fields per second value of NTSC video as 59.94 and the corresponding frames per second values as 29.97 (for telecined and interlaced) and 23.976 (for progressive). For simplicity, some documents even round these numbers to 60, 30, and 24. Strictly speaking, all those numbers are approximations. Black and white NTSC video was exactly 60 fields per second, but 60000/1001 was later chosen to accomodate color data while remaining compatible with contemporary black and white televisions. Digital NTSC video (such as on a DVD) is also 60000/1001 fields per second. From this, interlaced and telecined video are derived to be 30000/1001 frames per second; progressive video is 24000/1001 frames per second. Older versions of the MEncoder documentation and many archived mailing list posts refer to 59.94, 29.97, and 23.976. All MEncoder documentation has been updated to use the fractional values, and you should use them too. is incorrect. should be used instead. How telecine is used. All video intended to be displayed on an NTSC television set must be 60000/1001 fields per second. Made-for-TV movies and shows are often filmed directly at 60000/1001 fields per second, but the majority of cinema is filmed at 24 or 24000/1001 frames per second. When cinematic movie DVDs are mastered, the video is then converted for television using a process called telecine. On a DVD, the video is never actually stored as 60000/1001 fields per second. For video that was originally 60000/1001, each pair of fields is combined to form a frame, resulting in 30000/1001 frames per second. Hardware DVD players then read a flag embedded in the video stream to determine whether the odd- or even-numbered lines should form the first field. Usually, 24000/1001 frames per second content stays as it is when encoded for a DVD, and the DVD player must perform telecining on-the-fly. Sometimes, however, the video is telecined before being stored on the DVD; even though it was originally 24000/1001 frames per second, it becomes 60000/1001 fields per second. When it is stored on the DVD, pairs of fields are combined to form 30000/1001 frames per second. When looking at individual frames formed from 60000/1001 fields per second video, telecined or otherwise, interlacing is clearly visible wherever there is any motion, because one field (say, the even-numbered lines) represents a moment in time 1/(60000/1001) seconds later than the other. Playing interlaced video on a computer looks ugly both because the monitor is higher resolution and because the video is shown frame-after-frame instead of field-after-field. Notes: This section only applies to NTSC DVDs, and not PAL. The example MEncoder lines throughout the document are not intended for actual use. They are simply the bare minimum required to encode the pertaining video category. How to make good DVD rips or fine-tune libavcodec for maximal quality is not within the scope of this document. There are a couple footnotes specific to this guide, linked like this: [1] How to tell what type of video you have Progressive Progressive video was originally filmed at 24000/1001 fps, and stored on the DVD without alteration. When you play a progressive DVD in MPlayer, MPlayer will print the following line as soon as the movie begins to play: demux_mpg: 24000/1001 fps progressive NTSC content detected, switching framerate. From this point forward, demux_mpg should never say it finds "30000/1001 fps NTSC content." When you watch progressive video, you should never see any interlacing. Beware, however, because sometimes there is a tiny bit of telecine mixed in where you would not expect. I have encountered TV show DVDs that have one second of telecine at every scene change, or at seemingly random places. I once watched a DVD that had a progressive first half, and the second half was telecined. If you want to be really thorough, you can scan the entire movie: mplayer dvd://1 -nosound -vo null -benchmark Using makes MPlayer play the movie as quickly as it possibly can; still, depending on your hardware, it can take a while. Every time demux_mpg reports a framerate change, the line immediately above will show you the time at which the change occurred. Sometimes progressive video on DVDs is referred to as "soft-telecine" because it is intended to be telecined by the DVD player. Telecined Telecined video was originally filmed at 24000/1001, but was telecined before it was written to the DVD. MPlayer does not (ever) report any framerate changes when it plays telecined video. Watching a telecined video, you will see interlacing artifacts that seem to "blink": they repeatedly appear and disappear. You can look closely at this by mplayer dvd://1 Seek to a part with motion. Use the . key to step forward one frame at a time. Look at the pattern of interlaced-looking and progressive-looking frames. If the pattern you see is PPPII,PPPII,PPPII,... then the video is telecined. If you see some other pattern, then the video may have been telecined using some non-standard method; MEncoder cannot losslessly convert non-standard telecine to progressive. If you do not see any pattern at all, then it is most likely interlaced. Sometimes telecined video on DVDs is referred to as "hard-telecine". Since hard-telecine is already 60000/1001 fields per second, the DVD player plays the video without any manipulation. Another way to tell if your source is telecined or not is to play the source with the and command line options to see how matches frames. If the source is telecined, you should see on the console a 3:2 pattern with 0+.1.+2 and 0++1 alternating. This technique has the advantage that you do not need to watch the source to identify it, which could be useful if you wish to automate the encoding procedure, or to carry out said procedure remotely via a slow connection. Interlaced Interlaced video was originally filmed at 60000/1001 fields per second, and stored on the DVD as 30000/1001 frames per second. The interlacing effect (often called "combing") is a result of combining pairs of fields into frames. Each field is supposed to be 1/(60000/1001) seconds apart, and when they are displayed simultaneously the difference is apparent. As with telecined video, MPlayer should not ever report any framerate changes when playing interlaced content. When you view an interlaced video closely by frame-stepping with the . key, you will see that every single frame is interlaced. Mixed progressive and telecine All of a "mixed progressive and telecine" video was originally 24000/1001 frames per second, but some parts of it ended up being telecined. When MPlayer plays this category, it will (often repeatedly) switch back and forth between "30000/1001 fps NTSC" and "24000/1001 fps progressive NTSC". Watch the bottom of MPlayer's output to see these messages. You should check the "30000/1001 fps NTSC" sections to make sure they are actually telecine, and not just interlaced. Mixed progressive and interlaced In "mixed progressive and interlaced" content, progressive and interlaced video have been spliced together. This category looks just like "mixed progressive and telecine", until you examine the 30000/1001 fps sections and see that they do not have the telecine pattern. How to encode each category As I mentioned in the beginning, example MEncoder lines below are not meant to actually be used; they only demonstrate the minimum parameters to properly encode each category. Progressive Progressive video requires no special filtering to encode. The only parameter you need to be sure to use is . Otherwise, MEncoder will try to encode at 30000/1001 fps and will duplicate frames. mencoder dvd://1 -oac copy -ovc lavc -ofps 24000/1001 It is often the case, however, that a video that looks progressive actually has very short parts of telecine mixed in. Unless you are sure, it is safest to treat the video as mixed progressive and telecine. The performance loss is small [3]. Telecined Telecine can be reversed to retrieve the original 24000/1001 content, using a process called inverse-telecine. MPlayer contains several filters to accomplish this; the best filter, , is described in the mixed progressive and telecine section. Interlaced For most practical cases it is not possible to retrieve a complete progressive video from interlaced content. The only way to do so without losing half of the vertical resolution is to double the framerate and try to "guess" what ought to make up the corresponding lines for each field (this has drawbacks - see method 3). Encode the video in interlaced form. Normally, interlacing wreaks havoc with the encoder's ability to compress well, but libavcodec has two parameters specifically for dealing with storing interlaced video a bit better: and . Also, using is strongly recommended [2] because it will encode macroblocks as non-interlaced in places where there is no motion. Note that is NOT needed here. mencoder dvd://1 -oac copy -ovc lavc -lavcopts ildct:ilme:mbd=2 Use a deinterlacing filter before encoding. There are several of these filters available to choose from, each with its own advantages and disadvantages. Consult and to see what is available (grep for "deint"), read Michael's Niedermayer Deinterlacing filters comparison, and search the MPlayer mailing lists to find many discussions about the various filters. Again, the framerate is not changing, so no . Also, deinterlacing should be done after cropping [1] and before scaling. mencoder dvd://1 -oac copy -vf yadif -ovc lavc Unfortunately, this option is buggy with MEncoder; it ought to work well with MEncoder G2, but that is not here yet. You might experience crahes. Anyway, the purpose of is to create a full frame out of each field, which makes the framerate 60000/1001. The advantage of this approach is that no data is ever lost; however, since each frame comes from only one field, the missing lines have to be interpolated somehow. There are no very good methods of generating the missing data, and so the result will look a bit similar to when using some deinterlacing filters. Generating the missing lines creates other issues, as well, simply because the amount of data doubles. So, higher encoding bitrates are required to maintain quality, and more CPU power is used for both encoding and decoding. tfields has several different options for how to create the missing lines of each frame. If you use this method, then Reference the manual, and chose whichever option looks best for your material. Note that when using you have to specify both and to be twice the framerate of your original source. mencoder dvd://1 -oac copy -vf tfields=2 -ovc lavc \ -fps 60000/1001 -ofps 60000/1001 If you plan on downscaling dramatically, you can extract and encode only one of the two fields. Of course, you will lose half the vertical resolution, but if you plan on downscaling to at most 1/2 of the original, the loss will not matter much. The result will be a progressive 30000/1001 frames per second file. The procedure is to use , then crop [1] and scale appropriately. Remember that you will have to adjust the scale to compensate for the vertical resolution being halved. mencoder dvd://1 -oac copy -vf field=0 -ovc lavc Mixed progressive and telecine In order to turn mixed progressive and telecine video into entirely progressive video, the telecined parts have to be inverse-telecined. There are three ways to accomplish this, described below. Note that you should always inverse-telecine before any rescaling; unless you really know what you are doing, inverse-telecine before cropping, too [1]. is needed here because the output video will be 24000/1001 frames per second. is designed to inverse-telecine telecined material while leaving progressive data alone. In order to work properly, must be followed by the filter or else MEncoder will crash. is, however, the cleanest and most accurate method available for encoding both telecine and "mixed progressive and telecine". mencoder dvd://1 -oac copy -vf pullup,softskip -ovc lavc -ofps 24000/1001 An older method is to, rather than inverse-telecine the telecined parts, telecine the non-telecined parts and then inverse-telecine the whole video. Sound confusing? softpulldown is a filter that goes through a video and makes the entire file telecined. If we follow softpulldown with either or , the final result will be entirely progressive. is needed. mencoder dvd://1 -oac copy -vf softpulldown,ivtc=1 -ovc lavc -ofps 24000/1001 I have not used myself, but here is what D Richard Felker III has to say:
It is OK, but IMO it tries to deinterlace rather than doing inverse telecine too often (much like settop DVD players & progressive TVs) which gives ugly flickering and other artifacts. If you are going to use it, you at least need to spend some time tuning the options and watching the output first to make sure it is not messing up.
Mixed progressive and interlaced There are two options for dealing with this category, each of which is a compromise. You should decide based on the duration/location of each type. Treat it as progressive. The interlaced parts will look interlaced, and some of the interlaced fields will have to be dropped, resulting in a bit of uneven jumpiness. You can use a postprocessing filter if you want to, but it may slightly degrade the progressive parts. This option should definitely not be used if you want to eventually display the video on an interlaced device (with a TV card, for example). If you have interlaced frames in a 24000/1001 frames per second video, they will be telecined along with the progressive frames. Half of the interlaced "frames" will be displayed for three fields' duration (3/(60000/1001) seconds), resulting in a flicking "jump back in time" effect that looks quite bad. If you even attempt this, you must use a deinterlacing filter like or . It may also be a bad idea for progressive display, too. It will drop pairs of consecutive interlaced fields, resulting in a discontinuity that can be more visible than with the second method, which shows some progressive frames twice. 30000/1001 frames per second interlaced video is already a bit choppy because it really should be shown at 60000/1001 fields per second, so the duplicate frames do not stand out as much. Either way, it is best to consider your content and how you intend to display it. If your video is 90% progressive and you never intend to show it on a TV, you should favor a progressive approach. If it is only half progressive, you probably want to encode it as if it is all interlaced. Treat it as interlaced. Some frames of the progressive parts will need to be duplicated, resulting in uneven jumpiness. Again, deinterlacing filters may slightly degrade the progressive parts.
Footnotes About cropping: Video data on DVDs are stored in a format called YUV 4:2:0. In YUV video, luma ("brightness") and chroma ("color") are stored separately. Because the human eye is somewhat less sensitive to color than it is to brightness, in a YUV 4:2:0 picture there is only one chroma pixel for every four luma pixels. In a progressive picture, each square of four luma pixels (two on each side) has one common chroma pixel. You must crop progressive YUV 4:2:0 to even resolutions, and use even offsets. For example, is OK but is not. When you are dealing with interlaced YUV 4:2:0, the situation is a bit more complicated. Instead of every four luma pixels in the frame sharing a chroma pixel, every four luma pixels in each field share a chroma pixel. When fields are interlaced to form a frame, each scanline is one pixel high. Now, instead of all four luma pixels being in a square, there are two pixels side-by-side, and the other two pixels are side-by-side two scanlines down. The two luma pixels in the intermediate scanline are from the other field, and so share a different chroma pixel with two luma pixels two scanlines away. All this confusion makes it necessary to have vertical crop dimensions and offsets be multiples of four. Horizontal can stay even. For telecined video, I recommend that cropping take place after inverse telecining. Once the video is progressive you only need to crop by even numbers. If you really want to gain the slight speedup that cropping first may offer, you must crop vertically by multiples of four or else the inverse-telecine filter will not have proper data. For interlaced (not telecined) video, you must always crop vertically by multiples of four unless you use before cropping. About encoding parameters and quality: Just because I recommend here does not mean it should not be used elsewhere. Along with , is one of the two libavcodec options that increases quality the most, and you should always use at least those two unless the drop in encoding speed is prohibitive (e.g. realtime encoding). There are many other options to libavcodec that increase encoding quality (and decrease encoding speed) but that is beyond the scope of this document. About the performance of pullup: It is safe to use (along with ) on progressive video, and is usually a good idea unless the source has been definitively verified to be entirely progressive. The performace loss is small for most cases. On a bare-minimum encode, causes MEncoder to be 50% slower. Adding sound processing and advanced overshadows that difference, bringing the performance decrease of using down to 2%.
Encoding with the <systemitem class="library">libavcodec</systemitem> codec family libavcodec provides simple encoding to a lot of interesting video and audio formats. You can encode to the following codecs (more or less up to date): <systemitem class="library">libavcodec</systemitem>'s video codecs Video codec nameDescription mjpeg Motion JPEG ljpeg lossless JPEG jpegls JPEG LS targa Targa image gif GIF image bmp BMP image png PNG image h261 H.261 h263 H.263 h263p H.263+ mpeg4 ISO standard MPEG-4 (DivX, Xvid compatible) msmpeg4 pre-standard MPEG-4 variant by MS, v3 (AKA DivX3) msmpeg4v2 pre-standard MPEG-4 by MS, v2 (used in old ASF files) wmv1 Windows Media Video, version 1 (AKA WMV7) wmv2 Windows Media Video, version 2 (AKA WMV8) rv10 RealVideo 1.0 rv20 RealVideo 2.0 mpeg1video MPEG-1 video mpeg2video MPEG-2 video huffyuv lossless compression ffvhuff FFmpeg modified huffyuv lossless asv1 ASUS Video v1 asv2 ASUS Video v2 ffv1 FFmpeg's lossless video codec svq1 Sorenson video 1 flv Sorenson H.263 used in Flash Video flashsv Flash Screen Video dvvideo Sony Digital Video snow FFmpeg's experimental wavelet-based codec zbmv Zip Blocks Motion Video The first column contains the codec names that should be passed after the vcodec config, like: An example with MJPEG compression: mencoder dvd://2 -o title2.avi -ovc lavc -lavcopts vcodec=mjpeg -oac copy <systemitem class="library">libavcodec</systemitem>'s audio codecs Audio codec nameDescription mp2 MPEG Layer 2 ac3 AC-3, AKA Dolby Digital adpcm_ima_wav IMA adaptive PCM (4 bits per sample, 4:1 compression) sonic experimental FFmpeg lossy codec sonicls experimental FFmpeg lossless codec vorbis Xiph Ogg Vorbis codec wmav1 Windows Media Audio v1 codec wmav2 Windows Media Audio v2 codec The first column contains the codec names that should be passed after the acodec option, like: An example with AC-3 compression: mencoder dvd://2 -o title2.avi -oac lavc -lavcopts acodec=ac3 -ovc copy Contrary to libavcodec's video codecs, its audio codecs do not make a wise usage of the bits they are given as they lack some minimal psychoacoustic model (if at all) which most other codec implementations feature. However, note that all these audio codecs are very fast and work out-of-the-box everywhere MEncoder has been compiled with libavcodec (which is the case most of time), and do not depend on external libraries. Encoding options of libavcodec Ideally, you would probably want to be able to just tell the encoder to switch into "high quality" mode and move on. That would probably be nice, but unfortunately hard to implement as different encoding options yield different quality results depending on the source material. That is because compression depends on the visual properties of the video in question. For example, anime and live action have very different properties and thus require different options to obtain optimum encoding. The good news is that some options should never be left out, like , , and . See below for a detailed description of common encoding options. Options to adjust: vmax_b_frames: 1 or 2 is good, depending on the movie. Note that if you need to have your encode be decodable by DivX5, you need to activate closed GOP support, using libavcodec's option, but you need to deactivate scene detection, which is not a good idea as it will hurt encode efficiency a bit. vb_strategy=1: helps in high-motion scenes. On some videos, vmax_b_frames may hurt quality, but vmax_b_frames=2 along with vb_strategy=1 helps. dia: motion search range. Bigger is better and slower. Negative values are a completely different scale. Good values are -1 for a fast encode, or 2-4 for slower. predia: motion search pre-pass. Not as important as dia. Good values are 1 (default) to 4. Requires preme=2 to really be useful. cmp, subcmp, precmp: Comparison function for motion estimation. Experiment with values of 0 (default), 2 (hadamard), 3 (dct), and 6 (rate distortion). 0 is fastest, and sufficient for precmp. For cmp and subcmp, 2 is good for anime, and 3 is good for live action. 6 may or may not be slightly better, but is slow. last_pred: Number of motion predictors to take from the previous frame. 1-3 or so help at little speed cost. Higher values are slow for no extra gain. cbp, mv0: Controls the selection of macroblocks. Small speed cost for small quality gain. qprd: adaptive quantization based on the macroblock's complexity. May help or hurt depending on the video and other options. This can cause artifacts unless you set vqmax to some reasonably small value (6 is good, maybe as low as 4); vqmin=1 should also help. qns: very slow, especially when combined with qprd. This option will make the encoder minimize noise due to compression artifacts instead of making the encoded video strictly match the source. Do not use this unless you have already tweaked everything else as far as it will go and the results still are not good enough. vqcomp: Tweak ratecontrol. What values are good depends on the movie. You can safely leave this alone if you want. Reducing vqcomp puts more bits on low-complexity scenes, increasing it puts them on high-complexity scenes (default: 0.5, range: 0-1. recommended range: 0.5-0.7). vlelim, vcelim: Sets the single coefficient elimination threshold for luminance and chroma planes. These are encoded separately in all MPEG-like algorithms. The idea behind these options is to use some good heuristics to determine when the change in a block is less than the threshold you specify, and in such a case, to just encode the block as "no change". This saves bits and perhaps speeds up encoding. vlelim=-4 and vcelim=9 seem to be good for live movies, but seem not to help with anime; when encoding animation, you should probably leave them unchanged. qpel: Quarter pixel motion estimation. MPEG-4 uses half pixel precision for its motion search by default, therefore this option comes with an overhead as more information will be stored in the encoded file. The compression gain/loss depends on the movie, but it is usually not very effective on anime. qpel always incurs a significant cost in CPU decode time (+25% in practice). psnr: does not affect the actual encoding, but writes a log file giving the type/size/quality of each frame, and prints a summary of PSNR (Peak Signal to Noise Ratio) at the end. Options not recommended to play with: vme: The default is best. lumi_mask, dark_mask: Psychovisual adaptive quantization. You do not want to play with those options if you care about quality. Reasonable values may be effective in your case, but be warned this is very subjective. scplx_mask: Tries to prevent blocky artifacts, but postprocessing is better. Encoding setting examples The following settings are examples of different encoding option combinations that affect the speed vs quality tradeoff at the same target bitrate. All the encoding settings were tested on a 720x448 @30000/1001 fps video sample, the target bitrate was 900kbps, and the machine was an AMD-64 3400+ at 2400 MHz in 64 bits mode. Each encoding setting features the measured encoding speed (in frames per second) and the PSNR loss (in dB) compared to the "very high quality" setting. Please understand that depending on your source, your machine type and development advancements, you may get very different results. Description Encoding options speed (in fps) Relative PSNR loss (in dB) Very high quality 6fps 0dB High quality 15fps -0.5dB Fast 42fps -0.74dB Realtime 54fps -1.21dB Custom inter/intra matrices With this feature of libavcodec you are able to set custom inter (I-frames/keyframes) and intra (P-frames/predicted frames) matrices. It is supported by many of the codecs: mpeg1video and mpeg2video are reported as working. A typical usage of this feature is to set the matrices preferred by the KVCD specifications. The KVCD "Notch" Quantization Matrix: Intra: 8 9 12 22 26 27 29 34 9 10 14 26 27 29 34 37 12 14 18 27 29 34 37 38 22 26 27 31 36 37 38 40 26 27 29 36 39 38 40 48 27 29 34 37 38 40 48 58 29 34 37 38 40 48 58 69 34 37 38 40 48 58 69 79 Inter: 16 18 20 22 24 26 28 30 18 20 22 24 26 28 30 32 20 22 24 26 28 30 32 34 22 24 26 30 32 32 34 36 24 26 28 32 34 34 36 38 26 28 30 32 34 36 38 40 28 30 32 34 36 38 42 42 30 32 34 36 38 40 42 44 Usage: mencoder input.avi -o output.avi -oac copy -ovc lavc \ -lavcopts inter_matrix=...:intra_matrix=... mencoder input.avi -ovc lavc -lavcopts \ vcodec=mpeg2video:intra_matrix=8,9,12,22,26,27,29,34,9,10,14,26,27,29,34,37,\ 12,14,18,27,29,34,37,38,22,26,27,31,36,37,38,40,26,27,29,36,39,38,40,48,27,\ 29,34,37,38,40,48,58,29,34,37,38,40,48,58,69,34,37,38,40,48,58,69,79\ :inter_matrix=16,18,20,22,24,26,28,30,18,20,22,24,26,28,30,32,20,22,24,26,\ 28,30,32,34,22,24,26,30,32,32,34,36,24,26,28,32,34,34,36,38,26,28,30,32,34,\ 36,38,40,28,30,32,34,36,38,42,42,30,32,34,36,38,40,42,44 -oac copy -o svcd.mpg Example So, you have just bought your shiny new copy of Harry Potter and the Chamber of Secrets (widescreen edition, of course), and you want to rip this DVD so that you can add it to your Home Theatre PC. This is a region 1 DVD, so it is NTSC. The example below will still apply to PAL, except you will omit (because the output framerate is the same as the input framerate), and of course the crop dimensions will be different. After running , we follow the process detailed in the section How to deal with telecine and interlacing in NTSC DVDs and discover that it is 24000/1001 fps progressive video, which means that we need not use an inverse telecine filter, such as or . Next, we want to determine the appropriate crop rectangle, so we use the cropdetect filter: mplayer dvd://1 -vf cropdetect Make sure you seek to a fully filled frame (such as a bright scene, past the opening credits and logos), and you will see in MPlayer's console output: crop area: X: 0..719 Y: 57..419 (-vf crop=720:362:0:58) We then play the movie back with this filter to test its correctness: mplayer dvd://1 -vf crop=720:362:0:58 And we see that it looks perfectly fine. Next, we ensure the width and height are a multiple of 16. The width is fine, however the height is not. Since we did not fail 7th grade math, we know that the nearest multiple of 16 lower than 362 is 352. We could just use , but it would be nice to take a little off the top and a little off the bottom so that we retain the center. We have shrunk the height by 10 pixels, but we do not want to increase the y-offset by 5-pixels since that is an odd number and will adversely affect quality. Instead, we will increase the y-offset by 4 pixels: mplayer dvd://1 -vf crop=720:352:0:62 Another reason to shave pixels from both the top and the bottom is that we ensure we have eliminated any half-black pixels if they exist. Note that if your video is telecined, make sure the filter (or whichever inverse telecine filter you decide to use) appears in the filter chain before you crop. If it is interlaced, deinterlace before cropping. (If you choose to preserve the interlaced video, then make sure your vertical crop offset is a multiple of 4.) If you are really concerned about losing those 10 pixels, you might prefer instead to scale the dimensions down to the nearest multiple of 16. The filter chain would look like: -vf crop=720:362:0:58,scale=720:352 Scaling the video down like this will mean that some small amount of detail is lost, though it probably will not be perceptible. Scaling up will result in lower quality (unless you increase the bitrate). Cropping discards those pixels altogether. It is a tradeoff that you will want to consider for each circumstance. For example, if the DVD video was made for television, you might want to avoid vertical scaling, since the line sampling corresponds to the way the content was originally recorded. On inspection, we see that our movie has a fair bit of action and high amounts of detail, so we pick 2400Kbit for our bitrate. We are now ready to do the two pass encode. Pass one: mencoder dvd://1 -ofps 24000/1001 -oac copy -o Harry_Potter_2.avi -ovc lavc \ -lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:autoaspect:vpass=1 \ -vf pullup,softskip,crop=720:352:0:62,hqdn3d=2:1:2 And pass two is the same, except that we specify : mencoder dvd://1 -ofps 24000/1001 -oac copy -o Harry_Potter_2.avi -ovc lavc \ -lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:autoaspect:vpass=2 \ -vf pullup,softskip,crop=720:352:0:62,hqdn3d=2:1:2 The options will greatly increase the quality at the expense of encoding time. There is little reason to leave these options out when the primary goal is quality. The options select a comparison function that yields higher quality than the defaults. You might try experimenting with this parameter (refer to the man page for the possible values) as different functions can have a large impact on quality depending on the source material. For example, if you find libavcodec produces too much blocky artifacting, you could try selecting the experimental NSSE as comparison function via . For this movie, the resulting AVI will be 138 minutes long and nearly 3GB. And because you said that file size does not matter, this is a perfectly acceptable size. However, if you had wanted it smaller, you could try a lower bitrate. Increasing bitrates have diminishing returns, so while we might clearly see an improvement from 1800Kbit to 2000Kbit, it might not be so noticeable above 2000Kbit. Feel free to experiment until you are happy. Because we passed the source video through a denoise filter, you may want to add some of it back during playback. This, along with the post-processing filter, drastically improves the perception of quality and helps eliminate blocky artifacts in the video. With MPlayer's option, you can vary the amount of post-processing done by the spp filter depending on available CPU. Also, at this point, you may want to apply gamma and/or color correction to best suit your display. For example: mplayer Harry_Potter_2.avi -vf spp,noise=9ah:5ah,eq2=1.2 -autoq 3 Encoding with the <systemitem class="library">Xvid</systemitem> codec Xvid is a free library for encoding MPEG-4 ASP video streams. Before starting to encode, you need to set up MEncoder to support it. This guide mainly aims at featuring the same kind of information as x264's encoding guide. Therefore, please begin by reading the first part of that guide. What options should I use to get the best results? Please begin by reviewing the Xvid section of MPlayer's man page. This section is intended to be a supplement to the man page. The Xvid default settings are already a good tradeoff between speed and quality, therefore you can safely stick to them if the following section puzzles you. Encoding options of <systemitem class="library">Xvid</systemitem> vhq This setting affects the macroblock decision algorithm, where the higher the setting, the wiser the decision. The default setting may be safely used for every encode, while higher settings always help PSNR but are significantly slower. Please note that a better PSNR does not necessarily mean that the picture will look better, but tells you that it is closer to the original. Turning it off will noticeably speed up encoding; if speed is critical for you, the tradeoff may be worth it. bvhq This does the same job as vhq, but does it on B-frames. It has a negligible impact on speed, and slightly improves quality (around +0.1dB PSNR). max_bframes A higher number of consecutive allowed B-frames usually improves compressibility, although it may also lead to more blocking artifacts. The default setting is a good tradeoff between compressibility and quality, but you may increase it up to 3 if you are bitrate-starved. You may also decrease it to 1 or 0 if you are aiming at perfect quality, though in that case you should make sure your target bitrate is high enough to ensure that the encoder does not have to increase quantizers to reach it. bf_threshold This controls the B-frame sensitivity of the encoder, where a higher value leads to more B-frames being used (and vice versa). This setting is to be used together with ; if you are bitrate-starved, you should increase both and , while you may increase and reduce so that the encoder may use more B-frames in places that only really need them. A low number of and a high value of is probably not a wise choice as it will force the encoder to put B-frames in places that would not benefit from them, therefore reducing visual quality. However, if you need to be compatible with standalone players that only support old DivX profiles (which only supports up to 1 consecutive B-frame), this would be your only way to increase compressibility through using B-frames. trellis Optimizes the quantization process to get an optimal tradeoff between PSNR and bitrate, which allows significant bit saving. These bits will in return be spent elsewhere on the video, raising overall visual quality. You should always leave it on as its impact on quality is huge. Even if you are looking for speed, do not disable it until you have turned down and all other more CPU-hungry options to the minimum. hq_ac Activates a better coefficient cost estimation method, which slightly reduces filesize by around 0.15 to 0.19% (which corresponds to less than 0.01dB PSNR increase), while having a negligible impact on speed. It is therefore recommended to always leave it on. cartoon Designed to better encode cartoon content, and has no impact on speed as it just tunes the mode decision heuristics for this type of content. me_quality This setting is to control the precision of the motion estimation. The higher , the more precise the estimation of the original motion will be, and the better the resulting clip will capture the original motion. The default setting is best in all cases; thus it is not recommended to turn it down unless you are really looking for speed, as all the bits saved by a good motion estimation would be spent elsewhere, raising overall quality. Therefore, do not go any lower than 5, and even that only as a last resort. chroma_me Improves motion estimation by also taking the chroma (color) information into account, whereas alone only uses luma (grayscale). This slows down encoding by 5-10% but improves visual quality quite a bit by reducing blocking effects and reduces filesize by around 1.3%. If you are looking for speed, you should disable this option before starting to consider reducing . chroma_opt Is intended to increase chroma image quality around pure white/black edges, rather than improving compression. This can help to reduce the "red stairs" effect. lumi_mask Tries to give less bitrate to part of the picture that the human eye cannot see very well, which should allow the encoder to spend the saved bits on more important parts of the picture. The quality of the encode yielded by this option highly depends on personal preferences and on the type and monitor settings used to watch it (typically, it will not look as good if it is bright or if it is a TFT monitor). qpel Raise the number of candidate motion vectors by increasing the precision of the motion estimation from halfpel to quarterpel. The idea is to find better motion vectors which will in return reduce bitrate (hence increasing quality). However, motion vectors with quarterpel precision require a few extra bits to code, but the candidate vectors do not always give (much) better results. Quite often, the codec still spends bits on the extra precision, but little or no extra quality is gained in return. Unfortunately, there is no way to foresee the possible gains of , so you need to actually encode with and without it to know for sure. can be almost double encoding time, and requires as much as 25% more processing power to decode. It is not supported by all standalone players. gmc Tries to save bits on panning scenes by using a single motion vector for the whole frame. This almost always raises PSNR, but significantly slows down encoding (as well as decoding). Therefore, you should only use it when you have turned to the maximum. Xvid's GMC is more sophisticated than DivX's, but is only supported by few standalone players. Encoding profiles Xvid supports encoding profiles through the option, which are used to impose restrictions on the properties of the Xvid video stream such that it will be playable on anything which supports the chosen profile. The restrictions relate to resolutions, bitrates and certain MPEG-4 features. The following table shows what each profile supports. Simple Advanced Simple DivX Profile name 0 1 2 3 0 1 2 3 4 5 Handheld Portable NTSC Portable PAL Home Theater NTSC Home Theater PAL HDTV Width [pixels] 176 176 352 352 176 176 352 352 352 720 176 352 352 720 720 1280 Height [pixels] 144 144 288 288 144 144 288 288 576 576 144 240 288 480 576 720 Frame rate [fps] 15 15 15 15 30 30 15 30 30 30 15 30 25 30 25 30 Max average bitrate [kbps] 64 64 128 384 128 128 384 768 3000 8000 537.6 4854 4854 4854 4854 9708.4 Peak average bitrate over 3 secs [kbps] 800 8000 8000 8000 8000 16000 Max. B-frames 0 0 0 0 0 1 1 1 1 2 MPEG quantization X X X X X X Adaptive quantization X X X X X X X X X X X X Interlaced encoding X X X X X X X X X Quaterpixel X X X X X X Global motion compensation X X X X X X Encoding setting examples The following settings are examples of different encoding option combinations that affect the speed vs quality tradeoff at the same target bitrate. All the encoding settings were tested on a 720x448 @30000/1001 fps video sample, the target bitrate was 900kbps, and the machine was an AMD-64 3400+ at 2400 MHz in 64 bits mode. Each encoding setting features the measured encoding speed (in frames per second) and the PSNR loss (in dB) compared to the "very high quality" setting. Please understand that depending on your source, your machine type and development advancements, you may get very different results. DescriptionEncoding optionsspeed (in fps)Relative PSNR loss (in dB) Very high quality 16fps 0dB High quality 18fps -0.1dB Fast 28fps -0.69dB Realtime 38fps -1.48dB Encoding with the <systemitem class="library">x264</systemitem> codec x264 is a free library for encoding H.264/AVC video streams. Before starting to encode, you need to set up MEncoder to support it. Encoding options of x264 Please begin by reviewing the x264 section of MPlayer's man page. This section is intended to be a supplement to the man page. Here you will find quick hints about which options are most likely to interest most people. The man page is more terse, but also more exhaustive, and it sometimes offers much better technical detail. Introduction This guide considers two major categories of encoding options: Options which mainly trade off encoding time vs. quality Options which may be useful for fulfilling various personal preferences and special requirements Ultimately, only you can decide which options are best for your purposes. The decision for the first class of options is the simplest: you only have to decide whether you think the quality differences justify the speed differences. For the second class of options, preferences may be far more subjective, and more factors may be involved. Note that some of the "personal preferences and special requirements" options can still have large impacts on speed or quality, but that is not what they are primarily useful for. A couple of the "personal preference" options may even cause changes that look better to some people, but look worse to others. Before continuing, you need to understand that this guide uses only one quality metric: global PSNR. For a brief explanation of what PSNR is, see the Wikipedia article on PSNR. Global PSNR is the last PSNR number reported when you include the option in . Any time you read a claim about PSNR, one of the assumptions behind the claim is that equal bitrates are used. Nearly all of this guide's comments assume you are using two pass. When comparing options, there are two major reasons for using two pass encoding. First, using two pass often gains around 1dB PSNR, which is a very big difference. Secondly, testing options by doing direct quality comparisons with one pass encodes introduces a major confounding factor: bitrate often varies significantly with each encode. It is not always easy to tell whether quality changes are due mainly to changed options, or if they mostly reflect essentially random differences in the achieved bitrate. Options which primarily affect speed and quality subq: Of the options which allow you to trade off speed for quality, and (see below) are usually by far the most important. If you are interested in tweaking either speed or quality, these are the first options you should consider. On the speed dimension, the and options interact with each other fairly strongly. Experience shows that, with one reference frame, (the default setting) takes about 35% more time than . With 6 reference frames, the penalty grows to over 60%. 's effect on PSNR seems fairly constant regardless of the number of reference frames. Typically, achieves 0.2-0.5 dB higher global PSNR in comparison . This is usually enough to be visible. is slower and yields better quality at a reasonable cost. In comparison to , it usually gains 0.1-0.4 dB global PSNR with speed costs varying from 25%-100%. Unlike other levels of , the behavior of does not depend much on and . Instead, the effectiveness of depends mostly upon the number of B-frames used. In normal usage, this means has a large impact on both speed and quality in complex, high motion scenes, but it may not have much effect in low-motion scenes. Note that it is still recommended to always set to something other than zero (see below). is the slowest, highest quality mode. In comparison to , it usually gains 0.01-0.05 dB global PSNR with speed costs varying from 15%-33%. Since the tradeoff encoding time vs. quality is quite low, you should only use it if you are after every bit saving and if encoding time is not an issue. frameref: is set to 1 by default, but this should not be taken to imply that it is reasonable to set it to 1. Merely raising to 2 gains around 0.15dB PSNR with a 5-10% speed penalty; this seems like a good tradeoff. gains around 0.25dB PSNR over , which should be a visible difference. is around 15% slower than . Unfortunately, diminishing returns set in rapidly. can be expected to gain only 0.05-0.1 dB over at an additional 15% speed penalty. Above , the quality gains are usually very small (although you should keep in mind throughout this whole discussion that it can vary quite a lot depending on your source). In a fairly typical case, will improve global PSNR by a tiny 0.02dB over , at a speed cost of 15%-20%. At such high values, the only really good thing that can be said is that increasing it even further will almost certainly never harm PSNR, but the additional quality benefits are barely even measurable, let alone perceptible. Note: Raising to unnecessarily high values can and usually does hurt coding efficiency if you turn CABAC off. With CABAC on (the default behavior), the possibility of setting "too high" currently seems too remote to even worry about, and in the future, optimizations may remove the possibility altogether. If you care about speed, a reasonable compromise is to use low and values on the first pass, and then raise them on the second pass. Typically, this has a negligible negative effect on the final quality: You will probably lose well under 0.1dB PSNR, which should be much too small of a difference to see. However, different values of can occasionally affect frametype decision. Most likely, these are rare outlying cases, but if you want to be pretty sure, consider whether your video has either fullscreen repetitive flashing patterns or very large temporary occlusions which might force an I-frame. Adjust the first-pass so it is large enough to contain the duration of the flashing cycle (or occlusion). For example, if the scene flashes back and forth between two images over a duration of three frames, set the first pass to 3 or higher. This issue is probably extremely rare in live action video material, but it does sometimes come up in video game captures. me: This option is for choosing the motion estimation search method. Altering this option provides a straightforward quality-vs-speed tradeoff. is only a few percent faster than the default search, at a cost of under 0.1dB global PSNR. The default setting () is a reasonable tradeoff between speed and quality. gains a little under 0.1dB global PSNR, with a speed penalty that varies depending on . At high values of (e.g. 12 or so), is about 40% slower than the default . With , the speed penalty incurred drops to 25%-30%. uses an exhaustive search that is too slow for practical use. partitions=all: This option enables the use of 8x4, 4x8 and 4x4 subpartitions in predicted macroblocks (in addition to the default partitions). Enabling it results in a fairly consistent 10%-15% loss of speed. This option is rather useless in source containing only low motion, however in some high-motion source, particularly source with lots of small moving objects, gains of about 0.1dB can be expected. bframes: If you are used to encoding with other codecs, you may have found that B-frames are not always useful. In H.264, this has changed: there are new techniques and block types that are possible in B-frames. Usually, even a naive B-frame choice algorithm can have a significant PSNR benefit. It is interesting to note that using B-frames usually speeds up the second pass somewhat, and may also speed up a single pass encode if adaptive B-frame decision is turned off. With adaptive B-frame decision turned off ('s ), the optimal value for this setting is usually no more than , or else high-motion scenes can suffer. With adaptive B-frame decision on (the default behavior), it is safe to use higher values; the encoder will reduce the use of B-frames in scenes where they would hurt compression. The encoder rarely chooses to use more than 3 or 4 B-frames; setting this option any higher will have little effect. b_adapt: Note: This is on by default. With this option enabled, the encoder will use a reasonably fast decision process to reduce the number of B-frames used in scenes that might not benefit from them as much. You can use to tweak how B-frame-happy the encoder is. The speed penalty of adaptive B-frames is currently rather modest, but so is the potential quality gain. It usually does not hurt, however. Note that this only affects speed and frametype decision on the first pass. and have no effect on subsequent passes. b_pyramid: You might as well enable this option if you are using >=2 B-frames; as the man page says, you get a little quality improvement at no speed cost. Note that these videos cannot be read by libavcodec-based decoders older than about March 5, 2005. weight_b: In typical cases, there is not much gain with this option. However, in crossfades or fade-to-black scenes, weighted prediction gives rather large bitrate savings. In MPEG-4 ASP, a fade-to-black is usually best coded as a series of expensive I-frames; using weighted prediction in B-frames makes it possible to turn at least some of these into much smaller B-frames. Encoding time cost is minimal, as no extra decisions need to be made. Also, contrary to what some people seem to guess, the decoder CPU requirements are not much affected by weighted prediction, all else being equal. Unfortunately, the current adaptive B-frame decision algorithm has a strong tendency to avoid B-frames during fades. Until this changes, it may be a good idea to add to your x264encopts, if you expect fades to have a large effect in your particular video clip. threads: This option allows to spawn threads to encode in parallel on multiple CPUs. You can manually select the number of threads to be created or, better, set and let x264 detect how many CPUs are available and pick an appropriate number of threads. If you have a multi-processor machine, you should really consider using it as it can to increase encoding speed linearly with the number of CPU cores (about 94% per CPU core), with very little quality reduction (about 0.005dB for dual processor, about 0.01dB for a quad processor machine). Options pertaining to miscellaneous preferences Two pass encoding: Above, it was suggested to always use two pass encoding, but there are still reasons for not using it. For instance, if you are capturing live TV and encoding in realtime, you are forced to use single-pass. Also, one pass is obviously faster than two passes; if you use the exact same set of options on both passes, two pass encoding is almost twice as slow. Still, there are very good reasons for using two pass encoding. For one thing, single pass ratecontrol is not psychic, and it often makes unreasonable choices because it cannot see the big picture. For example, suppose you have a two minute long video consisting of two distinct halves. The first half is a very high-motion scene lasting 60 seconds which, in isolation, requires about 2500kbps in order to look decent. Immediately following it is a much less demanding 60-second scene that looks good at 300kbps. Suppose you ask for 1400kbps on the theory that this is enough to accomodate both scenes. Single pass ratecontrol will make a couple of "mistakes" in such a case. First of all, it will target 1400kbps in both segments. The first segment may end up heavily overquantized, causing it to look unacceptably and unreasonably blocky. The second segment will be heavily underquantized; it may look perfect, but the bitrate cost of that perfection will be completely unreasonable. What is even harder to avoid is the problem at the transition between the two scenes. The first seconds of the low motion half will be hugely over-quantized, because the ratecontrol is still expecting the kind of bitrate requirements it met in the first half of the video. This "error period" of heavily over-quantized low motion will look jarringly bad, and will actually use less than the 300kbps it would have taken to make it look decent. There are ways to mitigate the pitfalls of single-pass encoding, but they may tend to increase bitrate misprediction. Multipass ratecontrol can offer huge advantages over a single pass. Using the statistics gathered from the first pass encode, the encoder can estimate, with reasonable accuracy, the "cost" (in bits) of encoding any given frame, at any given quantizer. This allows for a much more rational, better planned allocation of bits between the expensive (high-motion) and cheap (low-motion) scenes. See below for some ideas on how to tweak this allocation to your liking. Moreover, two passes need not take twice as long as one pass. You can tweak the options in the first pass for higher speed and lower quality. If you choose your options well, you can get a very fast first pass. The resulting quality in the second pass will be slightly lower because size prediction is less accurate, but the quality difference is normally much too small to be visible. Try, for example, adding to the first pass . Then, on the second pass, use slower, higher-quality options: Three pass encoding? x264 offers the ability to make an arbitrary number of consecutive passes. If you specify on the first pass, then use on a subsequent pass, the subsequent pass will both read the statistics from the previous pass, and write its own statistics. An additional pass following this one will have a very good base from which to make highly accurate predictions of framesizes at a chosen quantizer. In practice, the overall quality gain from this is usually close to zero, and quite possibly a third pass will result in slightly worse global PSNR than the pass before it. In typical usage, three passes help if you get either bad bitrate prediction or bad looking scene transitions when using only two passes. This is somewhat likely to happen on extremely short clips. There are also a few special cases in which three (or more) passes are handy for advanced users, but for brevity, this guide omits discussing those special cases. qcomp: trades off the number of bits allocated to "expensive" high-motion versus "cheap" low-motion frames. At one extreme, aims for true constant bitrate. Typically this would make high-motion scenes look completely awful, while low-motion scenes would probably look absolutely perfect, but would also use many times more bitrate than they would need in order to look merely excellent. At the other extreme, achieves nearly constant quantization parameter (QP). Constant QP does not look bad, but most people think it is more reasonable to shave some bitrate off of the extremely expensive scenes (where the loss of quality is not as noticeable) and reallocate it to the scenes that are easier to encode at excellent quality. is set to 0.6 by default, which may be slightly low for many peoples' taste (0.7-0.8 are also commonly used). keyint: is solely for trading off file seekability against coding efficiency. By default, is set to 250. In 25fps material, this guarantees the ability to seek to within 10 seconds precision. If you think it would be important and useful to be able to seek within 5 seconds of precision, set ; this will hurt quality/bitrate slightly. If you care only about quality and not about seekability, you can set it to much higher values (understanding that there are diminishing returns which may become vanishingly low, or even zero). The video stream will still have seekable points as long as there are some scene changes. deblock: This topic is going to be a bit controversial. H.264 defines a simple deblocking procedure on I-blocks that uses pre-set strengths and thresholds depending on the QP of the block in question. By default, high QP blocks are filtered heavily, and low QP blocks are not deblocked at all. The pre-set strengths defined by the standard are well-chosen and the odds are very good that they are PSNR-optimal for whatever video you are trying to encode. The allow you to specify offsets to the preset deblocking thresholds. Many people seem to think it is a good idea to lower the deblocking filter strength by large amounts (say, -3). This is however almost never a good idea, and in most cases, people who are doing this do not understand very well how deblocking works by default. The first and most important thing to know about the in-loop deblocking filter is that the default thresholds are almost always PSNR-optimal. In the rare cases that they are not optimal, the ideal offset is plus or minus 1. Adjusting deblocking parameters by a larger amount is almost guaranteed to hurt PSNR. Strengthening the filter will smear more details; weakening the filter will increase the appearance of blockiness. It is definitely a bad idea to lower the deblocking thresholds if your source is mainly low in spacial complexity (i.e., not a lot of detail or noise). The in-loop filter does a rather excellent job of concealing the artifacts that occur. If the source is high in spacial complexity, however, artifacts are less noticeable. This is because the ringing tends to look like detail or noise. Human visual perception easily notices when detail is removed, but it does not so easily notice when the noise is wrongly represented. When it comes to subjective quality, noise and detail are somewhat interchangeable. By lowering the deblocking filter strength, you are most likely increasing error by adding ringing artifacts, but the eye does not notice because it confuses the artifacts with detail. This still does not justify lowering the deblocking filter strength, however. You can generally get better quality noise from postprocessing. If your H.264 encodes look too blurry or smeared, try playing with when you play your encoded movie. should conceal most mild artifacting. It will almost certainly look better than the results you would have gotten just by fiddling with the deblocking filter. Encoding setting examples The following settings are examples of different encoding option combinations that affect the speed vs quality tradeoff at the same target bitrate. All the encoding settings were tested on a 720x448 @30000/1001 fps video sample, the target bitrate was 900kbps, and the machine was an AMD-64 3400+ at 2400 MHz in 64 bits mode. Each encoding setting features the measured encoding speed (in frames per second) and the PSNR loss (in dB) compared to the "very high quality" setting. Please understand that depending on your source, your machine type and development advancements, you may get very different results. Description Encoding options speed (in fps) Relative PSNR loss (in dB) Very high quality 6fps 0dB High quality 13fps -0.89dB Fast 17fps -1.48dB Encoding with the <systemitem class="library">Video For Windows</systemitem> codec family Video for Windows provides simple encoding by means of binary video codecs. You can encode with the following codecs (if you have more, please tell us!) Note that support for this is very experimental and some codecs may not work correctly. Some codecs will only work in certain colorspaces, try and if a codec fails or gives wrong output. Video for Windows supported codecs Video codec file name Description (FourCC) md5sum Comment aslcodec_vfw.dll Alparysoft lossless codec vfw (ASLC) 608af234a6ea4d90cdc7246af5f3f29a avimszh.dll AVImszh (MSZH) 253118fe1eedea04a95ed6e5f4c28878 needs avizlib.dll AVIzlib (ZLIB) 2f1cc76bbcf6d77d40d0e23392fa8eda divx.dll DivX4Windows-VFW acf35b2fc004a89c829531555d73f1e6 huffyuv.dll HuffYUV (lossless) (HFYU) b74695b50230be4a6ef2c4293a58ac3b iccvid.dll Cinepak Video (cvid) cb3b7ee47ba7dbb3d23d34e274895133 icmw_32.dll Motion Wavelets (MWV1) c9618a8fc73ce219ba918e3e09e227f2 jp2avi.dll ImagePower MJPEG2000 (IPJ2) d860a11766da0d0ea064672c6833768b m3jp2k32.dll Morgan MJPEG2000 (MJ2C) f3c174edcbaef7cb947d6357cdfde7ff m3jpeg32.dll Morgan Motion JPEG Codec (MJPG) 1cd13fff5960aa2aae43790242c323b1 mpg4c32.dll Microsoft MPEG-4 v1/v2 b5791ea23f33010d37ab8314681f1256 tsccvid.dll TechSmith Camtasia Screen Codec (TSCC) 8230d8560c41d444f249802a2700d1d5 shareware error on windows vp31vfw.dll On2 Open Source VP3 Codec (VP31) 845f3590ea489e2e45e876ab107ee7d2 vp4vfw.dll On2 VP4 Personal Codec (VP40) fc5480a482ccc594c2898dcc4188b58f vp6vfw.dll On2 VP6 Personal Codec (VP60) 04d635a364243013898fd09484f913fb vp7vfw.dll On2 VP7 Personal Codec (VP70) cb4cc3d4ea7c94a35f1d81c3d750bc8d wrong FourCC? ViVD2.dll SoftMedia ViVD V2 codec VfW (GXVE) a7b4bf5cac630bb9262c3f80d8a773a1 msulvc06.DLL MSU Lossless codec (MSUD) 294bf9288f2f127bb86f00bfcc9ccdda Decodable by Window Media Player, not MPlayer (yet). camcodec.dll CamStudio lossless video codec (CSCD) 0efe97ce08bb0e40162ab15ef3b45615 sf.net/projects/camstudio The first column contains the codec names that should be passed after the codec parameter, like: The FourCC code used by each codec is given in the parentheses. An example to convert an ISO DVD trailer to a VP6 flash video file using compdata bitrate settings: mencoder -dvd-device zeiram.iso dvd://7 -o trailer.flv \ -ovc vfw -xvfwopts codec=vp6vfw.dll:compdata=onepass.mcf -oac mp3lame \ -lameopts cbr:br=64 -af lavcresample=22050 -vf yadif,scale=320:240,flip \ -of lavf -lavfopts i_certify_that_my_video_stream_does_not_use_b_frames Using vfw2menc to create a codec settings file. To encode with the Video for Windows codecs, you will need to set bitrate and other options. This is known to work on x86 on both *NIX and Windows. First you must build the vfw2menc program. It is located in the TOOLS subdirectory of the MPlayer source tree. To build on Linux, this can be done using Wine: winegcc vfw2menc.c -o vfw2menc -lwinmm -lole32 To build on Windows in MinGW or Cygwin use: gcc vfw2menc.c -o vfw2menc.exe -lwinmm -lole32 To build on MSVC you will need getopt. Getopt can be found in the original vfw2menc archive available at: The MPlayer on win32 project. Below is an example with the VP6 codec. vfw2menc -f VP62 -d vp6vfw.dll -s firstpass.mcf This will open the VP6 codec dialog window. Repeat this step for the second pass and use . Windows users can use to have the codec dialog display before encoding starts. Using <application>MEncoder</application> to create <application>QuickTime</application>-compatible files Why would one want to produce <application>QuickTime</application>-compatible Files? There are several reasons why producing QuickTime-compatible files can be desirable. You want any computer illiterate to be able to watch your encode on any major platform (Windows, Mac OS X, Unices …). QuickTime is able to take advantage of more hardware and software acceleration features of Mac OS X than platform-independent players like MPlayer or VLC. That means that your encodes have a chance to be played smoothly by older G4-powered machines. QuickTime 7 supports the next-generation codec H.264, which yields significantly better picture quality than previous codec generations (MPEG-2, MPEG-4 …). <application>QuickTime</application> 7 limitations QuickTime 7 supports H.264 video and AAC audio, but it does not support them muxed in the AVI container format. However, you can use MEncoder to encode the video and audio, and then use an external program such as mp4creator (part of the MPEG4IP suite) to remux the video and audio tracks into an MP4 container. QuickTime's support for H.264 is limited, so you will need to drop some advanced features. If you encode your video with features that QuickTime 7 does not support, QuickTime-based players will show you a pretty white screen instead of your expected video. B-frames: QuickTime 7 supports a maximum of 1 B-frame, i.e. . This means that and will have no effect, since they require to be greater than 1. Macroblocks: QuickTime 7 does not support 8x8 DCT macroblocks. This option () is off by default, so just be sure not to explicitly enable it. This also means that the option will have no effect, since it requires . Aspect ratio: QuickTime 7 does not support SAR (sample aspect ratio) information in MPEG-4 files; it assumes that SAR=1. Read the section on scaling for a workaround. Cropping Suppose you want to rip your freshly bought copy of "The Chronicles of Narnia". Your DVD is region 1, which means it is NTSC. The example below would still apply to PAL, except you would omit and use slightly different and dimensions. After running , you follow the process detailed in the section How to deal with telecine and interlacing in NTSC DVDs and discover that it is 24000/1001 fps progressive video. This simplifies the process somewhat, since you do not need to use an inverse telecine filter such as or a deinterlacing filter such as . Next, you need to crop out the black bars from the top and bottom of the video, as detailed in this previous section. Scaling The next step is truly heartbreaking. QuickTime 7 does not support MPEG-4 videos with a sample aspect ratio other than 1, so you will need to upscale (which wastes a lot of disk space) or downscale (which loses some details of the source) the video to square pixels. Either way you do it, this is highly inefficient, but simply cannot be avoided if you want your video to be playable by QuickTime 7. MEncoder can apply the appropriate upscaling or downscaling by specifying respectively or . This will scale your video to the correct width for the cropped height, rounded to the closest multiple of 16 for optimal compression. Remember that if you are cropping, you should crop first, then scale: -vf crop=720:352:0:62,scale=-10:-1 A/V sync Because you will be remuxing into a different container, you should always use the option to ensure that duplicated frames are actually duplicated in the video output. Without this option, MEncoder will simply put a marker in the video stream that a frame was duplicated, and rely on the client software to show the same frame twice. Unfortunately, this "soft duplication" does not survive remuxing, so the audio would slowly lose sync with the video. The final filter chain looks like this: -vf crop=720:352:0:62,scale=-10:-1,harddup Bitrate As always, the selection of bitrate is a matter of the technical properties of the source, as explained here, as well as a matter of taste. This movie has a fair bit of action and lots of detail, but H.264 video looks good at much lower bitrates than XviD or other MPEG-4 codecs. After much experimentation, the author of this guide chose to encode this movie at 900kbps, and thought that it looked very good. You may decrease bitrate if you need to save more space, or increase it if you need to improve quality. Encoding example You are now ready to encode the video. Since you care about quality, of course you will be doing a two-pass encode. To shave off some encoding time, you can specify the option on the first pass; this reduces and to 1. To save some disk space, you can use the option to strip off the first few seconds of the video. (I found that this particular movie has 32 seconds of credits and logos.) can be 0 or 1. The other options are documented in Encoding with the x264 codec and the man page. mencoder dvd://1 -o /dev/null -ss 32 -ovc x264 \ -x264encopts pass=1:turbo:bitrate=900:bframes=1:\ me=umh:partitions=all:trellis=1:qp_step=4:qcomp=0.7:direct_pred=auto:keyint=300 \ -vf crop=720:352:0:62,scale=-10:-1,harddup \ -oac faac -faacopts br=192:mpeg=4:object=1 -channels 2 -srate 48000 \ -ofps 24000/1001 If you have a multi-processor machine, don't miss the opportunity to dramatically speed-up encoding by enabling x264's multi-threading mode by adding to your command-line. The second pass is the same, except that you specify the output file and set . mencoder dvd://1 -o narnia.avi -ss 32 -ovc x264 \ -x264encopts pass=2:turbo:bitrate=900:frameref=5:bframes=1:\ me=umh:partitions=all:trellis=1:qp_step=4:qcomp=0.7:direct_pred=auto:keyint=300 \ -vf crop=720:352:0:62,scale=-10:-1,harddup \ -oac faac -faacopts br=192:mpeg=4:object=1 -channels 2 -srate 48000 \ -ofps 24000/1001 The resulting AVI should play perfectly in MPlayer, but of course QuickTime can not play it because it does not support H.264 muxed in AVI. So the next step is to remux the video into an MP4 container. Remuxing as MP4 There are several ways to remux AVI files to MP4. You can use mp4creator, which is part of the MPEG4IP suite. First, demux the AVI into separate audio and video streams using MPlayer. mplayer narnia.avi -dumpaudio -dumpfile narnia.aac mplayer narnia.avi -dumpvideo -dumpfile narnia.h264 The filenames are important; mp4creator requires that AAC audio streams be named .aac and H.264 video streams be named .h264. Now use mp4creator to create a new MP4 file out of the audio and video streams. mp4creator -create=narnia.aac narnia.mp4 mp4creator -create=narnia.h264 -rate=23.976 narnia.mp4 Unlike the encoding step, you must specify the framerate as a decimal (such as 23.976), not a fraction (such as 24000/1001). This narnia.mp4 file should now be playable with any QuickTime 7 application, such as QuickTime Player or iTunes. If you are planning to view the video in a web browser with the QuickTime plugin, you should also hint the movie so that the QuickTime plugin can start playing it while it is still downloading. mp4creator can create these hint tracks: mp4creator -hint=1 narnia.mp4 mp4creator -hint=2 narnia.mp4 mp4creator -optimize narnia.mp4 You can check the final result to ensure that the hint tracks were created successfully: mp4creator -list narnia.mp4 You should see a list of tracks: 1 audio, 1 video, and 2 hint tracks. Track Type Info 1 audio MPEG-4 AAC LC, 8548.714 secs, 190 kbps, 48000 Hz 2 video H264 Main@5.1, 8549.132 secs, 899 kbps, 848x352 @ 23.976001 fps 3 hint Payload mpeg4-generic for track 1 4 hint Payload H264 for track 2 Adding metadata tags If you want to add tags to your video that show up in iTunes, you can use AtomicParsley. AtomicParsley narnia.mp4 --metaEnema --title "The Chronicles of Narnia" --year 2005 --stik Movie --freefree --overWrite The option removes any existing metadata (mp4creator inserts its name in the "encoding tool" tag), and reclaims the space from the deleted metadata. The option sets the type of video (such as Movie or TV Show), which iTunes uses to group related video files. The option overwrites the original file; without it, AtomicParsley creates a new auto-named file in the same directory and leaves the original file untouched. Using <application>MEncoder</application> to create VCD/SVCD/DVD-compliant files Format Constraints MEncoder is capable of creating VCD, SCVD and DVD format MPEG files using the libavcodec library. These files can then be used in conjunction with vcdimager or dvdauthor to create discs that will play on a standard set-top player. The DVD, SVCD, and VCD formats are subject to heavy constraints. Only a small selection of encoded picture sizes and aspect ratios are available. If your movie does not already meet these requirements, you may have to scale, crop or add black borders to the picture to make it compliant. Format Constraints Format Resolution V. Codec V. Bitrate Sample Rate A. Codec A. Bitrate FPS Aspect NTSC DVD 720x480, 704x480, 352x480, 352x240 MPEG-2 9800 kbps 48000 Hz AC-3,PCM 1536 kbps (max) 30000/1001, 24000/1001 4:3, 16:9 (only for 720x480) NTSC DVD 352x240 These resolutions are rarely used for DVDs because they are fairly low quality. MPEG-1 1856 kbps 48000 Hz AC-3,PCM 1536 kbps (max) 30000/1001, 24000/1001 4:3, 16:9 NTSC SVCD 480x480 MPEG-2 2600 kbps 44100 Hz MP2 384 kbps (max) 30000/1001 4:3 NTSC VCD 352x240 MPEG-1 1150 kbps 44100 Hz MP2 224 kbps 24000/1001, 30000/1001 4:3 PAL DVD 720x576, 704x576, 352x576, 352x288 MPEG-2 9800 kbps 48000 Hz MP2,AC-3,PCM 1536 kbps (max) 25 4:3, 16:9 (only for 720x576) PAL DVD 352x288 MPEG-1 1856 kbps 48000 Hz MP2,AC-3,PCM 1536 kbps (max) 25 4:3, 16:9 PAL SVCD 480x576 MPEG-2 2600 kbps 44100 Hz MP2 384 kbps (max) 25 4:3 PAL VCD 352x288 MPEG-1 1152 kbps 44100 Hz MP2 224 kbps 25 4:3 If your movie has 2.35:1 aspect (most recent action movies), you will have to add black borders or crop the movie down to 16:9 to make a DVD or VCD. If you add black borders, try to align them at 16-pixel boundaries in order to minimize the impact on encoding performance. Thankfully DVD has sufficiently excessive bitrate that you do not have to worry too much about encoding efficiency, but SVCD and VCD are highly bitrate-starved and require effort to obtain acceptable quality. GOP Size Constraints DVD, VCD, and SVCD also constrain you to relatively low GOP (Group of Pictures) sizes. For 30 fps material the largest allowed GOP size is 18. For 25 or 24 fps, the maximum is 15. The GOP size is set using the option. Bitrate Constraints VCD video is required to be CBR at 1152 kbps. This highly limiting constraint also comes along with an extremly low vbv buffer size of 327 kilobits. SVCD allows varying video bitrates up to 2500 kbps, and a somewhat less restrictive vbv buffer size of 917 kilobits is allowed. DVD video bitrates may range anywhere up to 9800 kbps (though typical bitrates are about half that), and the vbv buffer size is 1835 kilobits. Output Options MEncoder has options to control the output format. Using these options we can instruct it to create the correct type of file. The options for VCD and SVCD are called xvcd and xsvcd, because they are extended formats. They are not strictly compliant, mainly because the output does not contain scan offsets. If you need to generate an SVCD image, you should pass the output file to vcdimager. VCD: -of mpeg -mpegopts format=xvcd SVCD: -of mpeg -mpegopts format=xsvcd DVD (with timestamps on every frame, if possible): -of mpeg -mpegopts format=dvd:tsaf DVD with NTSC Pullup: -of mpeg -mpegopts format=dvd:tsaf:telecine -ofps 24000/1001 This allows 24000/1001 fps progressive content to be encoded at 30000/1001 fps whilst maintaing DVD-compliance. Aspect Ratio The aspect argument of is used to encode the aspect ratio of the file. During playback the aspect ratio is used to restore the video to the correct size. 16:9 or "Widescreen" -lavcopts aspect=16/9 4:3 or "Fullscreen" -lavcopts aspect=4/3 2.35:1 or "Cinemascope" NTSC -vf scale=720:368,expand=720:480 -lavcopts aspect=16/9 To calculate the correct scaling size, use the expanded NTSC width of 854/2.35 = 368 2.35:1 or "Cinemascope" PAL -vf scale=720:432,expand=720:576 -lavcopts aspect=16/9 To calculate the correct scaling size, use the expanded PAL width of 1024/2.35 = 432 Maintaining A/V sync In order to maintain audio/video synchronization throughout the encode, MEncoder has to drop or duplicate frames. This works rather well when muxing into an AVI file, but is almost guaranteed to fail to maintain A/V sync with other muxers such as MPEG. This is why it is necessary to append the video filter at the end of the filter chain to avoid this kind of problem. You can find more technical information about in the section Improving muxing and A/V sync reliability or in the manual page. Sample Rate Conversion If the audio sample rate in the original file is not the same as required by the target format, sample rate conversion is required. This is achieved using the option and the audio filter together. DVD: -srate 48000 -af lavcresample=48000 VCD and SVCD: -srate 44100 -af lavcresample=44100 Using libavcodec for VCD/SVCD/DVD Encoding Introduction libavcodec can be used to create VCD/SVCD/DVD compliant video by using the appropriate options. lavcopts This is a list of fields in that you may be required to change in order to make a complaint movie for VCD, SVCD, or DVD: acodec: for VCD, SVCD, or PAL DVD; is most commonly used for DVD. PCM audio may also be used for DVD, but this is mostly a big waste of space. Note that MP3 audio is not compliant for any of these formats, but players often have no problem playing it anyway. abitrate: 224 for VCD; up to 384 for SVCD; up to 1536 for DVD, but commonly used values range from 192 kbps for stereo to 384 kbps for 5.1 channel sound. vcodec: for VCD; for SVCD; is usually used for DVD but you may also use for CIF resolutions. keyint: Used to set the GOP size. 18 for 30fps material, or 15 for 25/24 fps material. Commercial producers seem to prefer keyframe intervals of 12. It is possible to make this much larger and still retain compatibility with most players. A of 25 should never cause any problems. vrc_buf_size: 327 for VCD, 917 for SVCD, and 1835 for DVD. vrc_minrate: 1152, for VCD. May be left alone for SVCD and DVD. vrc_maxrate: 1152 for VCD; 2500 for SVCD; 9800 for DVD. For SVCD and DVD, you might wish to use lower values depending on your own personal preferences and requirements. vbitrate: 1152 for VCD; up to 2500 for SVCD; up to 9800 for DVD. For the latter two formats, vbitrate should be set based on personal preference. For instance, if you insist on fitting 20 or so hours on a DVD, you could use vbitrate=400. The resulting video quality would probably be quite bad. If you are trying to squeeze out the maximum possible quality on a DVD, use vbitrate=9800, but be warned that this could constrain you to less than an hour of video on a single-layer DVD. vstrict: =0 should be used to create DVDs. Without this option, MEncoder creates a stream that cannot be correctly decoded by some standalone DVD players. Examples This is a typical minimum set of for encoding video: VCD: -lavcopts vcodec=mpeg1video:vrc_buf_size=327:vrc_minrate=1152:\ vrc_maxrate=1152:vbitrate=1152:keyint=15:acodec=mp2 SVCD: -lavcopts vcodec=mpeg2video:vrc_buf_size=917:vrc_maxrate=2500:vbitrate=1800:\ keyint=15:acodec=mp2 DVD: -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:\ keyint=15:vstrict=0:acodec=ac3 Advanced Options For higher quality encoding, you may also wish to add quality-enhancing options to lavcopts, such as , , and others. Note that and , while often useful with MPEG-4, are not usable with MPEG-1 or MPEG-2. Also, if you are trying to make a very high quality DVD encode, it may be useful to add to lavcopts. Doing so may help reduce the appearance of blocks in flat-colored areas. Putting it all together, this is an example of a set of lavcopts for a higher quality DVD: -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=8000:\ keyint=15:trell:mbd=2:precmp=2:subcmp=2:cmp=2:dia=-10:predia=-10:cbp:mv0:\ vqmin=1:lmin=1:dc=10:vstrict=0 Encoding Audio VCD and SVCD support MPEG-1 layer II audio, using one of toolame, twolame, or libavcodec's MP2 encoder. The libavcodec MP2 is far from being as good as the other two libraries, however it should always be available to use. VCD only supports constant bitrate audio (CBR) whereas SVCD supports variable bitrate (VBR), too. Be careful when using VBR because some bad standalone players might not support it too well. For DVD audio, libavcodec's AC-3 codec is used. toolame For VCD and SVCD: -oac toolame -toolameopts br=224 twolame For VCD and SVCD: -oac twolame -twolameopts br=224 libavcodec For DVD with 2 channel sound: -oac lavc -lavcopts acodec=ac3:abitrate=192 For DVD with 5.1 channel sound: -channels 6 -oac lavc -lavcopts acodec=ac3:abitrate=384 For VCD and SVCD: -oac lavc -lavcopts acodec=mp2:abitrate=224 Putting it all Together This section shows some complete commands for creating VCD/SVCD/DVD compliant videos. PAL DVD mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=dvd:tsaf \ -vf scale=720:576,harddup -srate 48000 -af lavcresample=48000 \ -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:\ keyint=15:vstrict=0:acodec=ac3:abitrate=192:aspect=16/9 -ofps 25 \ -o movie.mpg movie.avi NTSC DVD mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=dvd:tsaf \ -vf scale=720:480,harddup -srate 48000 -af lavcresample=48000 \ -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:\ keyint=18:vstrict=0:acodec=ac3:abitrate=192:aspect=16/9 -ofps 30000/1001 \ -o movie.mpg movie.avi PAL AVI Containing AC-3 Audio to DVD If the source already has AC-3 audio, use -oac copy instead of re-encoding it. mencoder -oac copy -ovc lavc -of mpeg -mpegopts format=dvd:tsaf \ -vf scale=720:576,harddup -ofps 25 \ -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:\ keyint=15:vstrict=0:aspect=16/9 -o movie.mpg movie.avi NTSC AVI Containing AC-3 Audio to DVD If the source already has AC-3 audio, and is NTSC @ 24000/1001 fps: mencoder -oac copy -ovc lavc -of mpeg -mpegopts format=dvd:tsaf:telecine \ -vf scale=720:480,harddup -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:\ vrc_maxrate=9800:vbitrate=5000:keyint=15:vstrict=0:aspect=16/9 -ofps 24000/1001 \ -o movie.mpg movie.avi PAL SVCD mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xsvcd -vf \ scale=480:576,harddup -srate 44100 -af lavcresample=44100 -lavcopts \ vcodec=mpeg2video:mbd=2:keyint=15:vrc_buf_size=917:vrc_minrate=600:\ vbitrate=2500:vrc_maxrate=2500:acodec=mp2:abitrate=224 -ofps 25 \ -o movie.mpg movie.avi NTSC SVCD mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xsvcd -vf \ scale=480:480,harddup -srate 44100 -af lavcresample=44100 -lavcopts \ vcodec=mpeg2video:mbd=2:keyint=18:vrc_buf_size=917:vrc_minrate=600:\ vbitrate=2500:vrc_maxrate=2500:acodec=mp2:abitrate=224 -ofps 30000/1001 \ -o movie.mpg movie.avi PAL VCD mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xvcd -vf \ scale=352:288,harddup -srate 44100 -af lavcresample=44100 -lavcopts \ vcodec=mpeg1video:keyint=15:vrc_buf_size=327:vrc_minrate=1152:\ vbitrate=1152:vrc_maxrate=1152:acodec=mp2:abitrate=224 -ofps 25 \ -o movie.mpg movie.avi NTSC VCD mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xvcd -vf \ scale=352:240,harddup -srate 44100 -af lavcresample=44100 -lavcopts \ vcodec=mpeg1video:keyint=18:vrc_buf_size=327:vrc_minrate=1152:\ vbitrate=1152:vrc_maxrate=1152:acodec=mp2:abitrate=224 -ofps 30000/1001 \ -o movie.mpg movie.avi