DVD Video Authoring

by Irene Koo

February 6, 1997.

Introduction

Authoring is generally defined as the processes of preparing contents, encoding video and audio, and creating the final DVD title image. Sometimes, the term "pre-mastering" and "authoring" are interchanged. Figure 1 illustrates the flow of the pre-mastering process. However, authoring is actually the processing of laying out multiple audio tracks and a video track, generating sub-titles, menu pages, parental lock-out features, interactive functions such as program search, time search, seamless play, and pause, and final MPEG editing of video and audio. That is, the "Multiplexing" process in pre-mastering. Since authoring is always performed along with encoding and disc formatting, it is, in many cases, referred as the entire pre-mastering process. In any case, content providers, using sophisticated authoring tools, can create high quality interactive DVD titles.

Figure 1. The Pre-mastering Process Flow

Material Preparations

The first to authoring is the collection of materials. These materials include video, audio, still images, and sub-pictures. Video source is the CCIR-601 video originated from films. It usually has 30 frames/sec. Audio includes the surround track and up to 8 different language tracks of the title. All language tracks must be compared for level, mix, and equalization so that seamless switching between languages can be achieved. The purpose of still images is to provide break points in the title, so that search functions and other interactive functions can be achieved. The preparations of still images include identifying breakpoints in the video, defining the time duration of the images, as well as generating the still images either from the video source or from graphic artists. Sub-pictures are bitmaps to be overlaid with video frames. They include menus, sub-titles, graphics, and simple animation. Each sub-picture is created using only 4 of the 16 color palettes defined for the title. The format for sub-pictures must be some standard computer image format such as TIFF, GIF, or BMP. Once created, their start and stop time must be defined in order to synchronized with their associated video and audio elements. The maximum number of sub-pictures in the title is 32. Sub-pictures are categorized as foreground, background, emphasis-1, and emphasis-2. Once all media elements are prepared, they will be catalogued and made ready for the next phase.

Techniques and Parameters Determinations

The cataloguing process defined the media elements to be used in the main feature, previews, trailers, directors cuts, and different rated versions of the feature [1]. Having knowledge of how various elements will be used in constructing the title is the key to intelligent parameters determinations. The parameters are determined to make tradeoffs between picture quality, length of program, number and quality of audio channels, number of subtitles, and level of inter-activity[2]. The choices of parameters are almost arbitrary and are depended on the experience of production engineer. The following is a list of some of the basic parameters needed to be determined for the title. It is by no mean complete.

Basic Parameters of a DVD video title:

The next step to parameters determinations is the average bit rate calculation. This step is to ensure that average video bit rate does not exceed or go too below the maximum of 3.5 Mbps. The maximum program rate (e.g. video + audio + sub-pictures) is 10.08 Mbps. The maximum video rate is 9.8 Mbps. Since MPEG encodes video frames that have high degree of activities and differences with higher bit rates and interactive jump points require additional bandwidth, the maximum average bit rate of 3.5 Mbps is specified. The following is an average-bit-rate calculation example taken directly from [2]. The average video bit rate is calculated to be 3.14 Mbps.

A DVD Video title is to be created using the following parameters:

4% of the total disc capacity (4.7 GB for single-sided single layer DVD disc) is always reserved for backup of program control data and for additional information to be added after editing. The total run length for different rated features, previews, and trailers is 127.5 minutes. Table 1 shows the storage requirements needed for each media element.

Table 1. Storage Requirements for Each Media Element in Average-Bit-Rate Calculation Example
Media Element Total Run Length Average Bit Rate Total Storage Requirements
4 Language Tracks 127.5 minutes 0.384 Mbps per language 4*127.5*60*0.384Mbps/8
=1468 Mbytes
4 Sub-picture streams 127.5 Minutes 0.01 Mbps per language 4*127.5*60*0.01Mbps/8
=38 Mbytes
Reserved 4% of 4.7 Gbytes 188 Mbytes
subtotal: 1694 Mbytes
Video 127.5 Minutes 3006 Mbytes

Encoding

High quality compression is the heart of DVD technology. For example, a 127.5-minute uncompressed movie requires approximately 180 Gbytes of storage space, without accounting for audio. The example in Techniques and Parameters Determinations shows that the allowed storage space for video is about 3 Gbytes. That is, a compression ratio of 60:1 is required in order to store the video track, audio tracks, and sub-pictures into a 4.7 Gbytes storage space.

Video Encoding

According to the DVD specification, video encoding is done using MPEG-2 compression technology. MPEG-2 exploits the temporal and spatial redundancy between video frames. It compares changes from frame to frame and only the differences are stored. MPEG-2 encoding allocates more bits per frame for frames that have high degree of activities and allocates few bits per frame for frames with less motion. Thus, MPEG-2 is a variable bit rate encoding scheme. The maximum bit rate for video is 10.08, which takes into account the need for large bandwidth for complex scenes and for branching to different location in the video stream.

Prior to the start of the MPEG-2 compression processing, two additional steps can be performed on the video source to achieve better compression performance: noise reduction and inverse telecine. Noise can be generated while transferring video from file to tape, editing, or dubbing. The noise reduction system removes high frequency noise, thereby reducing the random information in the video. Telecine is a process of converting 24 frames/sec video to 30 frames/sec required by NTSC standard. The conversion is accomplished by duplicating frames at regular intervals. Inverse telecine process removes the duplicated frames, thus allowing more bandwidth be allocated to the video.

MPEG-2 encoding is a two-pass process. During the first pass, the MPEG-2 encoding system scans the video source, detect when scenes change, and determine the optimal bit rates for each frame. The output of the first pass is an Encoding Decision List (EDL) that contains all encoding parameters for the video. The list is to be viewed by the production engineer. Parameters can be modified if the production engineer feels the necessity.

The second pass of the encoding process is the actual encoding of the video using the parameters listed in EDL. The encoding process is done in real-time. The production engineer can simultaneously encode and decode the video stream. Figure 2 illustrates the two-pass video encoding process.

Figure 2. Two Pass MPEG-2 Video Encoding Process

Audio Encoding

The DVD Book C specifies several techniques for audio compression: Dolby AC-3, MPEG audio, and Linear PCM. The specification stated further that NTSC (525/60) video is mandated to use Dolby AC-e and/or Linear PCM, with MPEG audio as an option. PAL(625/50) is mandated to use MPEG audio and/or Linear PCM, with Dolby AC-3 as an option. The DVD Book C also specifies the sampling frequency, transfer rate, and the number of channel for each of the three audio compression techniques. Table 2 list the audio data specifications described in [3]

Table 2. Audio Data Specifications

Linear PCMDolby AC-3 MPEG Audio
Sampling Frequency48K, 96K 48K48K
Number of Bits16/20/24 bits compressedcompressed
Transfer Ratemax. 6.144 Mpbs max. 448 kbps max. 640 kbps
Number of Channelsmax. 8 max. 5.1max. 7.1

5.1 channels include 5 surround channels plus a low frequency channel (sub-woofer). The possible audio encoding techniques are listed below:

  • Dolby AC-3 Stereo
  • Dolby AC-3 5.1 Surround Sound
  • MPEG-2 Stereo
  • MPEG-2 7.1 Surround Sound
  • Linear PCM, 16 Bits, 48 kHz Stereo
  • Linear PCM, 24 Bits, 96 kHz Stereo

Figure 3 illustrates the audio encoding process.

Figure 3. Audio Encoding Process

Sub-Pictures and Still Images Encoding

Sub-pictures are saved into a standard computer image format when they are created. They are encoded as run-length encoded bitmaps of 2 bits/pixel in the encoding process. The maximum number of bits in each run-length coded line is 1440 bits. Still images are encoded as MPEG full reference frames (e.g. I-frames) and are incorporated into the video stream.

Multiplexing

The multiplexing processing defines the program flow of the DVD title. It specifies how each of the media elements be presented to users and how users can interact with the program. Program flow specifications are translated to navigation commands that will be incorporated into program cells and program chains. Figure 4 shows a pictorial description of a program cell.

Figure 4. A Program Cell

A Nav pack is a button-command. A cell can contain up to 36 buttons, with each button containing one command. Each command can consist of at most three combining instructions. Here is a list of available instructions from [4]:

GoTo branch between commands
Link transfer between the same domain
Jump transfer between each domain
Compare recognition of parameter value
SetSystem player system setting
Set calculate GPRM values

Several cells and cell commands form together a program(PG). Each cell can have one cell command, and the total number of cell commands, pre-commands, and post-commands in a program chain must be less than or equal to 128. Figure 5 shows a program.

Figure 5. A Program

Programs and video objects together form a program chain (PGC). The maximum number of programs in a program chain is 99. The programs in a program chain can contain up to 255 cells. Figure 6 describes the structure of a program chain.

Figure 6. Program Chain Structure

A DVD title can have only one program chain -- one_sequential_PCG_title, or it can have multiple program chains - multi_PGC_title. Figure 7 shows a multi_PGC_title. Interactive functions such as part_of_title searches, directors cuts, and parental lock-outs can be achieved by creating the title as a multi_PGC_title, with different directors cuts and different rated versions on different program chains.

Figure 7. A Multi-PGC-Title

Simulation and Verification

After all media elements and control information are multiplexed into one stream, simulation testing is to be performed on the stream to verify that the presentation is acceptable. The stream must guarantee that audio, video, and sub-pictures are synchronized; otherwise, the content must be re-edited or re-encoded. Beside synchronization, interactive functions may also be simulated and verified.

References

  1. Kilroy Hughes, Frequently Asked Questions About DVD, The CD-Info Company, Inc.
  2. Sonic DVD Creator: Blueprint for DVD Premastering, Sonic Solutions
  3. DVD-Video Authoring, NB Digital Solutions, Inc.
  4. DVD Forum - Interactive Functions, Hitachi Ltd.
  5. Philip Nemec, A Day at the DVD Forum: technical notes
  6. Technical Papers from DVD Forum
  7. DVD authoring lab, Developer Relations Group, Intel Corp.