Video Capture, Transcoding, and Authoring/Video Standards

This is an overview of the complex topic of video standards, with emphasis on the topics needed to understand the video editing, transcoding, and authoring process. Wikipedia has much more reference material, as do other online and printed works.

Analog Broadcast




The NTSC analog broadcast standard is used in the United States, Canada, Japan, South Korea, the Philippines, and some other countries. NTSC stands for "National Television Standards Committee." It is sometimes also humorously referred to as "Never Twice the Same Color," due to the difficulties that the system sometimes presents in reproducing color accurately.

NTSC content consists of frames broadcast at 29.97 (actually 30/1.001) per second. The rate at which an NTSC television is scanned from top to bottom, however, is double that, 59.94 fields per second. Each field consists of half of the lines from a frame — the first field carrying the even numbered lines, and the second carrying the odd numbered lines. This alternation of even and odd lines is called interlacing. The refresh rate of roughly 60 frames per second was chosen so that the human eye would not perceive flicker while watching a broadcast. The decision to use interlaced rather than progressive (non-interlaced) scan was based on practical concerns, the most important of which is that interlacing allows broadcasts using about half the bandwidth of progressive scan, with an apparent resolution that is often similar to that of a progressive scan broadcast. This allowed more efficient use of the television broadcast spectrum, and to some extent simplified the construction of televisions themselves.

Interlacing is still present in many broadcasts. For all practical purposes, all NTSC content is interlaced and broadcast at 29.97 frames per second. Some digital broadcast formats are also interlaced. In video processing, interlacing is at best a nuisance, and at worst a source of difficult problems.

NTSC has a nominal content of 486 vertical lines. There is no standard for horizontal resolution. Most televisions overscan and will not display the entire broadcast frame. In the days when most recording equipment and televisions were analog, the horizontal resolution of a television signal could vary greatly depending on the source. An analog VHS cassette is capable of only 240 lines of horizontal resolution, which is similar to broadcast NTSC. S-VHS and LaserDisc are capable of 400 lines, although all things considered equal, LaserDisc generally produces a better picture than S-VHS. DVDs support a number of resolutions, but the most common (and highest) is 720 horizontal pixels by 480 vertical pixels. Many older televisions are incapable of displaying images that sharp; newer ones fare better.

Today, NTSC content is typically digitally captured and played back at a resolution of 720 horizontal pixels by 480 vertical pixels, at a rate of 59.94 fields per second. As mentioned above, there are other resolutions that are supported by various digital standards.

PAL ("Phase Alternating Line") is an analog television broadcast standard used in most of Europe and much of the rest of the world. France and large portions of the former Soviet Union (and other countries here and there) are exceptions; they use a system called SECAM. We will consider PAL and SECAM to be identical for the purposes of this discussion, because even though the underlying principles of the two systems are different, they have attained interoperability in Europe due to the predominance of multi-format equipment. PAL and SECAM have similar resolutions; if the analog differences are ignored, they can be treated almost the same.

For purposes of digital video, PAL is similar to NTSC. It is an interlaced standard just like NTSC, although the odd field is transmitted first (the opposite of NTSC). The main differences between the two are in resolution and frame rate. PAL has a greater vertical resolution of 576 lines (of a total of 625 actual broadcast lines, the difference carrying no picture). PAL has a frame rate of 25 frames per second, and consequently 50 fields per second. It is typically digitized at a resolution of 720 x 576 pixels.

The difference between NTSC and PAL frame rates makes the conversion of NTSC to PAL material and vice versa tricky. This is discussed elsewhere in this book.

Digital Transport (Digital Broadcast)


Most digital broadcasts — over-the-air (OTA), cable, satellite, and so on — use some variant of the MPEG-2 standard. When MPEG-2 content is broadcast, it is called a transport stream. In theory, this stream can be captured and used directly as source material, although some form of transcoding or remuxing may be required. However, some broadcast streams are encrypted and can't be played back until they are decrypted.

Content recorded on a semi-permanent medium like videotape could informally be considered a transport stream, although the term technically applies only to MPEG. The most common format for tape-recorded video is DV (and variants thereof), a format popularized by Sony (because of its association with the Firewire standard), but now used by most digital camcorders. DV is a primary source of material for editing and transcoding. DV is lossy, but only slightly so, and is readily encoded and decoded. DV can also be sent directly from a camera rather than being recorded.

Because newer encoding schemes like MPEG-4 provide much better compression than MPEG-2, some satellite broadcasters (DirecTV for example) are delivering new high-definition channels as MPEG-4 streams. This trend will continue, because the demand for high-definition content is great, while satellite bandwidth is limited.

Program (DVD, et cetera)


Much of the available digitally recorded material exists on DVDs, which are encoded using MPEG-2. When MPEG video is recorded on a permanent medium, it is called a Program stream. Program streams have the same overall structure as transport streams, but are designed with random access, among other things, in mind. Program streams may be encoded more efficiently than transport streams. Live video, when transmitted in a compressed digital stream, has to be encoded "on the fly" using specialized hardware. Program streams, on the other hand, can be encoded offline at whatever speed suits the application.

Other MPEG-related formats, including MPEG-4, can also be considered program streams. Newer formats like MPEG-4 provide better compression than MPEG-2, although encoding is more computationally expensive. Successor formats to the DVD all use MPEG-4 or a similar codec.

In general, the object of editing, transcoding, and authoring — the topic of this book — is to produce a program stream suitable for writing to a permanent medium like a DVD.



Joseph N. Hall