Authoring is generally defined as the processes of preparing contents, encoding video and audio, and creating the final DVD title image. Sometimes, the term "pre-mastering" and "authoring" are interchanged. Figure 1 illustrates the flow of the pre-mastering process. However, authoring is actually the processing of laying out multiple audio tracks and a video track, generating sub-titles, menu pages, parental lock-out features, interactive functions such as program search, time search, seamless play, and pause, and final MPEG editing of video and audio. That is, the "Multiplexing" process in pre-mastering. Since authoring is always performed along with encoding and disc formatting, it is, in many cases, referred as the entire pre-mastering process. In any case, content providers, using sophisticated authoring tools, can create high quality interactive DVD titles.
Figure 1. The Pre-mastering Process Flow
The first to authoring is the collection of materials. These materials include video, audio, still images, and sub-pictures. Video source is the CCIR-601 video originated from films. It usually has 30 frames/sec. Audio includes the surround track and up to 8 different language tracks of the title. All language tracks must be compared for level, mix, and equalization so that seamless switching between languages can be achieved. The purpose of still images is to provide break points in the title, so that search functions and other interactive functions can be achieved. The preparations of still images include identifying breakpoints in the video, defining the time duration of the images, as well as generating the still images either from the video source or from graphic artists. Sub-pictures are bitmaps to be overlaid with video frames. They include menus, sub-titles, graphics, and simple animation. Each sub-picture is created using only 4 of the 16 color palettes defined for the title. The format for sub-pictures must be some standard computer image format such as TIFF, GIF, or BMP. Once created, their start and stop time must be defined in order to synchronized with their associated video and audio elements. The maximum number of sub-pictures in the title is 32. Sub-pictures are categorized as foreground, background, emphasis-1, and emphasis-2. Once all media elements are prepared, they will be catalogued and made ready for the next phase.
The cataloguing process defined the media elements to be used in the main feature, previews, trailers, directors cuts, and different rated versions of the feature [1]. Having knowledge of how various elements will be used in constructing the title is the key to intelligent parameters determinations. The parameters are determined to make tradeoffs between picture quality, length of program, number and quality of audio channels, number of subtitles, and level of inter-activity[2]. The choices of parameters are almost arbitrary and are depended on the experience of production engineer. The following is a list of some of the basic parameters needed to be determined for the title. It is by no mean complete.
Basic Parameters of a DVD video title:
The next step to parameters determinations is the average bit rate calculation. This step is to ensure that average video bit rate does not exceed or go too below the maximum of 3.5 Mbps. The maximum program rate (e.g. video + audio + sub-pictures) is 10.08 Mbps. The maximum video rate is 9.8 Mbps. Since MPEG encodes video frames that have high degree of activities and differences with higher bit rates and interactive jump points require additional bandwidth, the maximum average bit rate of 3.5 Mbps is specified. The following is an average-bit-rate calculation example taken directly from [2]. The average video bit rate is calculated to be 3.14 Mbps.
A DVD Video title is to be created using the following parameters:
4% of the total disc capacity (4.7 GB for single-sided single layer DVD disc) is always reserved for backup of program control data and for additional information to be added after editing. The total run length for different rated features, previews, and trailers is 127.5 minutes. Table 1 shows the storage requirements needed for each media element.
Table 1. Storage Requirements for Each Media Element in Average-Bit-Rate Calculation Example
Media Element | Total Run Length | Average Bit Rate | Total Storage Requirements |
4 Language Tracks | 127.5 minutes | 0.384 Mbps per language | 4*127.5*60*0.384Mbps/8 =1468 Mbytes |
4 Sub-picture streams | 127.5 Minutes | 0.01 Mbps per language | 4*127.5*60*0.01Mbps/8 =38 Mbytes |
Reserved | 4% of 4.7 Gbytes | 188 Mbytes | |
subtotal: | 1694 Mbytes | ||
Video | 127.5 Minutes | 3006 Mbytes |
High quality compression is the heart of DVD technology. For example, a 127.5-minute uncompressed movie requires approximately 180 Gbytes of storage space, without accounting for audio. The example in Techniques and Parameters Determinations shows that the allowed storage space for video is about 3 Gbytes. That is, a compression ratio of 60:1 is required in order to store the video track, audio tracks, and sub-pictures into a 4.7 Gbytes storage space.
According to the DVD specification, video encoding
is done using MPEG-2 compression technology. MPEG-2 exploits the
temporal and spatial redundancy between video frames. It compares
changes from frame to frame and only the differences are stored.
MPEG-2 encoding allocates more bits per frame for frames that
have high degree of activities and allocates few bits per frame
for frames with less motion. Thus, MPEG-2 is a variable bit rate
encoding scheme. The maximum bit rate for video is 10.08, which
takes into account the need for large bandwidth for complex scenes
and for branching to different location in the video stream.
Prior to the start of the MPEG-2 compression processing,
two additional steps can be performed on the video source to achieve
better compression performance: noise reduction and inverse telecine.
Noise can be generated while transferring video from file to tape,
editing, or dubbing. The noise reduction system removes high frequency
noise, thereby reducing the random information in the video. Telecine
is a process of converting 24 frames/sec video to 30 frames/sec
required by NTSC standard. The conversion is accomplished by duplicating
frames at regular intervals. Inverse telecine process removes
the duplicated frames, thus allowing more bandwidth be allocated
to the video.
MPEG-2 encoding is a two-pass process. During the
first pass, the MPEG-2 encoding system scans the video source,
detect when scenes change, and determine the optimal bit rates
for each frame. The output of the first pass is an Encoding Decision
List (EDL) that contains all encoding parameters for the video.
The list is to be viewed by the production engineer. Parameters
can be modified if the production engineer feels the necessity.
The second pass of the encoding process is the actual encoding of the video using the parameters listed in EDL. The encoding process is done in real-time. The production engineer can simultaneously encode and decode the video stream. Figure 2 illustrates the two-pass video encoding process.
Figure 2. Two Pass MPEG-2 Video Encoding Process
The DVD Book C specifies several techniques for audio compression: Dolby AC-3, MPEG audio, and Linear PCM. The specification stated further that NTSC (525/60) video is mandated to use Dolby AC-e and/or Linear PCM, with MPEG audio as an option. PAL(625/50) is mandated to use MPEG audio and/or Linear PCM, with Dolby AC-3 as an option. The DVD Book C also specifies the sampling frequency, transfer rate, and the number of channel for each of the three audio compression techniques. Table 2 list the audio data specifications described in [3]
Table 2. Audio Data Specifications
Linear PCM | Dolby AC-3 | MPEG Audio | |
Sampling Frequency | 48K, 96K | 48K | 48K |
Number of Bits | 16/20/24 bits | compressed | compressed |
Transfer Rate | max. 6.144 Mpbs | max. 448 kbps | max. 640 kbps |
Number of Channels | max. 8 | max. 5.1 | max. 7.1 |
5.1 channels include 5 surround channels plus a low frequency
channel (sub-woofer). The possible audio encoding techniques are
listed below:
Figure 3 illustrates the audio encoding process.
Figure 3. Audio Encoding Process
Sub-pictures are saved into a standard computer image
format when they are created. They are encoded as run-length encoded
bitmaps of 2 bits/pixel in the encoding process. The maximum number
of bits in each run-length coded line is 1440 bits. Still images
are encoded as MPEG full reference frames (e.g. I-frames) and
are incorporated into the video stream.
The multiplexing processing defines the program flow
of the DVD title. It specifies how each of the media elements
be presented to users and how users can interact with the program.
Program flow specifications are translated to navigation commands
that will be incorporated into program cells and program chains.
Figure 4 shows a pictorial description of a program cell.
Figure 4. A Program Cell
A Nav pack is a button-command. A cell can contain
up to 36 buttons, with each button containing one command. Each
command can consist of at most three combining instructions. Here
is a list of available instructions from [4]:
Several cells and cell commands form together a program(PG). Each
cell can have one cell command, and the total number of cell commands,
pre-commands, and post-commands in a program chain must be less
than or equal to 128. Figure 5 shows a program.
Figure 5. A Program
Programs and video objects together form a program
chain (PGC). The maximum number of programs in a program chain
is 99. The programs in a program chain can contain up to 255 cells.
Figure 6 describes the structure of a program chain.
Figure 6. Program Chain Structure
A DVD title can have only one program chain -- one_sequential_PCG_title,
or it can have multiple program chains - multi_PGC_title. Figure
7 shows a multi_PGC_title. Interactive functions such as part_of_title
searches, directors cuts, and parental lock-outs can be achieved
by creating the title as a multi_PGC_title, with different directors
cuts and different rated versions on different program chains.
Figure 7. A Multi-PGC-Title
After all media elements and control information
are multiplexed into one stream, simulation testing is to be performed
on the stream to verify that the presentation is acceptable. The
stream must guarantee that audio, video, and sub-pictures are
synchronized; otherwise, the content must be re-edited or re-encoded.
Beside synchronization, interactive functions may also be simulated
and verified.
Sub-Pictures and Still Images Encoding
Multiplexing
GoTo
branch between commands
Link
transfer between the same domain
Jump
transfer between each domain Compare
recognition of parameter value
SetSystem
player system setting
Set
calculate GPRM values
Simulation and Verification
References