JPEG - Idea and Practice/Introduction

The four distinct modes of operation

The JPEG committee intended that the method should be available in a number of variants and with a number of extensions:

The sequential DCT mode of operation, where the picture is scanned in the same way as in part one (but not in our zigzag way, column-wise from left to right, but line-wise from top to bottom, just as in reading ).
The progressive DCT mode of operation, where the picture is displayed in its entirety concurrently with the transmission of the bitstream, at first imperfect and then gradually improving.
The lossless mode of operation, where the file is only compressed, with no data lost by cosine transforms or quantization.
The hierarchical (DCT or lossless) mode of operation, where the picture is stored at multiple resolutions for different uses (low-resolution screen, high-resolution printer, etc.), in such a way that the lower-resolution images are stored with supplementary data which can be added on to produce higher-resolution images as required.

The colour values are usually measured in bytes (8-bit numbers), and in this case the precision of the (real) numbers in the calculations is set to 11 bit.

JPEG also offers extended precision, primarily intended for grey scale pictures, where the colour values instead of utilizing 8 bits use 12 bits (a range from 0 to 4095), and where the precision in the calculations is increased to 15 bit. Extended precision implies that the Huffman tables must go to size 15 (instead of 11) for the DC numbers and size to 14 (instead of 10) for the AC numbers. Furthermore, the numbers in the quantization tables can be words (from 0 to 65535) instead of bytes. As this possibility is rarely used, we will ignore it here.

For the baseline sequential DCT mode, that is, the non-extended sequential DCT mode, the method of coding is the Huffman coding with two tables for each component. For the extended modes you can choose between the Huffman coding with two or four tables for each component and the arithmetic coding.

One Mode Survives

Although four modes were intended, only the baseline sequential DCT mode has survived in widespread use.

There is not much point in the progressive and the hierarchical mode nowadays, where a JPEG picture is transmitted and displayed fast.
The benefits of the lossless mode seem too minor. Arithmetic coding can compress a little better than the Huffman coding, but it is slower and there have been patent-related problems.

Software Used in Researching this Book

Our account here, like our earlier account in part one, was accompanied by the writing of some programs. This time this was done only to ensure that we had properly understood the procedure. We will show pieces of these programs written in a Pascal-like language which should be easy for everybody to understand.

We first made a program that can convert a picture in BMP format to a grey scale picture in JPEG format. When this worked correctly, we extended it to colour pictures. Such a program, to be of use in production systems for JPEG files, must of course be written in assembly language and without making use of the co-processor (80-bit numbers) in the transforms. However, if the program is only for demonstration or if it is a part of a program producing computer graphic, it may be written in a high-level language and may use floating point operations. Our program which can read a JPEG file and draw the picture, for the baseline sequential DCT mode, was made in the same way. Since there are already many such programs, it does not need to be efficient. On the contrary, we have made it extra slow by using a "setpixel" procedure, because it is simpler - and because it gives the drawing a funny look.

The picture to the left below is made with our program in part one and the right with our program in this part. The quality is approximately the same. The first takes up 16.3 Kb and the second takes up 15.1 Kb (uncompressed they take up 228 Kb):

The demo program: 16.3 Kb
The true program: 15.1 Kb

Requirements Documents

"Digital Compression and Coding of Continuous-Tone Still Images - Requirements and Guidelines/Recommendation T.81" (1992), also called just T.81, is 180 pages long. If you are only interested in the baseline sequential DCT mode with Huffman coding, you do not have to read all 180 pages. The knowledge required of mathematics and programming is limited. You must already know the meaning of the mathematical terms, since these are not explained.

The purpose of T.81 was to set a common standard for the core of the procedure. The specifics are described separately in standards for the implementation. These are in additional documents with titles like "JPEG File Interchange Format, Version ...". The only thing in our account that is in these implementation documents is the colour space designation: the RGB → YCbCr transform.

The formulas for this colour transform shown in part one can be found in version 1.02 from 1992 (7 pages). T.81 only speaks of four components. It is implicit that only one component means that the picture is in grey scale, that three components can be the RGB components or most commonly the YCbCr components, and that the fourth component is for the possibility of transparency.