The JPEG standard (ISO/IEC 10918-1)

Vicente González Ruiz

July 28, 2022

1 Intro
2 Lossless JPEG [2]
3 Codec LS-JPEG
3.1 Compressor
3.2 Descompressor
4 Huffman encoding in LS-JPEG
5 Huffman decoding in LS-JPEG
6 Lossy JPEG
7 \([0,255]\rightarrow [-128,127]\)
8 8x8 2D-DCT
9 Basis functions of the 8x8 2D-DCT
10 Advantages of the 8x8 2D-DCT
11 Scalar quantization
12 Entropy encoding
13 Interlacing of the color components
14 Entropy encoding of the runs
15 Progressive transmission
16 The hierarchical algorithm
17 Motion JPEG (M-JPEG)
18 Lab
19 Resources

1 Intro

JPEG = Joint Photographic Experts Group.
Developed in 1992 by the ISO [1].
True color (up to 24 bits/pixel) and grayscale (up to 8 bits/pixel) images.
Eligible compression quality between \(0-100\) [3].
Visually lossless reconstructions of color images for compression ratios about \(1\) bits/pixel (bpp).
Based on the Discrete Cosine Transform (DCT).
Several working modes: lossy, lossless and hierarchical.

2 Lossless JPEG [2]

Lossless.
Based on an spatial predictor and a 0-order static variable-length encoder (Huffman).

The block diagram is

where

Prediction Context

Predictors

\(P_0\)	\(\hat {s}\leftarrow 0\)
\(P_1\)	\(\hat {s}\leftarrow a\)
\(P_2\)	\(\hat {s}\leftarrow b\)
\(P_3\)	\(\hat {s}\leftarrow c\)
\(P_4\)	\(\hat {s}\leftarrow a+b-c\)
\(P_5\)	\(\hat {s}\leftarrow a+(b-c)/2\)
\(P_6\)	\(\hat {s}\leftarrow b+(a-c)/2\)
\(P_7\)	\(\hat {s}\leftarrow (b+c)/2\)

3 Codec LS-JPEG

3.1 Compressor

Generate \(\hat {s}\).
Compute the prediction error \(e\leftarrow s - \hat {s}\).
Encode \(e\).

3.2 Descompressor

Generate \(\hat {s}\) (idential to the Step 1 of the compressor).
Descode \(e\).
Compute the pixel value \(s\leftarrow e+\hat {s}\).

4 Huffman encoding in LS-JPEG

Search \(e\) in \(DIFF\) and select \(SSSS\).
Encode \(SSSS\) following the base code.
If \(e>0\), then:
1. Encode \(e\) using a binary number of \(SSSS\) bits. The most significant bit of \(e\) will be always 1.
Else:
1. Encode \(e-1\) using a binary number of \(SSSS\) bits. The most significant bit of \(e\) will be always a 0.

Example (\(e=5\))

\(SSSS=3\).
Output \(\leftarrow 100_2\).
Output \(\leftarrow 101_2\).

Example (\(e=-9\))

\(SSSS=4\).
Output \(\leftarrow 101_2\).
Output \(\leftarrow 0110_2\) (the four least significant bits of the two’s complement of \(-10_{10}\)).

5 Huffman decoding in LS-JPEG

Decode the \(SSSS\) category using the base code.
Decode the magnitude. Let \(x\leftarrow \) the next input bit.
If \(x\ne 0\), then:
1. \(e\leftarrow x << (SSSS-1) + \text {siguientes} (SSSS-1)\) bits.
Else:
1. \(e\leftarrow (-1)~\text {AND}~(x << (SSSS-1)+\text {siguientes}~(SSSS-1)~\text {bits}+1)\).

Example (decode \(100101\))

\(SSSS\leftarrow 3\).
\(x\leftarrow 1\).
\(e\leftarrow 2^2+\) next two input bits (01) \(=101_2=5_{10}\).

Example (decode \(1010110\))

\(SSSS\leftarrow 4\).
\(x\leftarrow 0\).
(Using 8 bits of precision) \(e\leftarrow \) \(11111111_2\) AND (\(0000_2+110_2+1_2) = 11110111_2 = -9_{10}\) (the four least significant bits of \(0111_2\) are supposed to be \(1\).

6 Lossy JPEG

For a RGB image, the baseline algorithm consist of:

Transform the image from the RGB to the YCbCr domain.
Subsample the crominance (CrCb).
Shift each color component to the range \([-128,127]\).
For each component (Y, Cb y Cr):
1. Apply the (\(8\times 8\))-DCT to each component.
2. Quantize the DCT coefficients.
3. Entropy encode the quantized coefficients.

7 \([0,255]\rightarrow [-128,127]\)

Every component should be 0-average. For this reason, if the values of the Y, Cb and CR components are in the range \([0,255]\), the \(128\) is substracted (pixel-to-pixel) from each of them.
This is neccesary to reduce the arithmetic precision of the computation of the next step (the DCT).

8 8x8 2D-DCT

Every component is DCT-transformed in blocks of \(8\times 8\) pixels.

9 Basis functions of the 8x8 2D-DCT

Every \(8\times 8\)-DCT coefficient represent the weight that the corresponding pattern has to reconstruct the block.

10 Advantages of the 8x8 2D-DCT

A lower total computational cost:
It’s faster the computation of \((\frac {N}{8}\times \frac {N}{8})\) \(8\times 8\)-DCTs than only one \(N\times N\)-DCT.
“In-line” operation:
The compressor process the images using blocks of \(8\times 8\) pixels, independiently of the size of the image.

11 Scalar quantization

The main objective of the \(8\times 8\)-DCT is the spatial compactation of the energy in the image. As a consequence, a small set of coefficients accumulate the most part of this energy.
After this decorrelating step, the DCT coeffients are also very decorrelated. Therefore, scalar quantization is a good choice to quantize the spectral domain in order to reduce de amoun of encoded data.
The quantization generates a big number of DCT coefficients very close or equal to 0, following a Laplace probability distribution.
The quantization step is so important in the result that the JPEG studied and determined the best quantization matrixes (one for the luma and other for the chroma).

Luminance

Chrominance


16	11	10	16	24	40	51	61
12	12	14	19	26	58	60	55
14	13	16	24	40	57	69	56
14	17	22	29	51	87	80	62
18	22	37	56	68	109	103	77
24	35	55	64	81	104	113	92
49	64	78	87	103	121	120	101
72	92	95	98	112	100	103	99


17	18	24	47	99	99	99	99
18	21	26	66	99	99	99	99
24	26	56	99	99	99	99	99
47	66	99	99	99	99	99	99
99	99	99	99	99	99	99	99
99	99	99	99	99	99	99	99
99	99	99	99	99	99	99	99
99	99	99	99	99	99	99	99

At it can be seen, there is an overall tendence to preserve the low frequencies of the image.
The quantization is described by \begin {equation} \text {2D-DCT}'[u,v] = \text {round}\Big (\frac {8\times 8\text {-DCT}[u,v]}{\text {Z}[u,v]}\Big ) \end {equation} where \(\text {Z}[\cdot ,\cdot ]\) is the quantization matrix.
It is possible to use different quantization matrixes, but they should be sent to the decompressor.

12 Entropy encoding

Substract to the DC coefficient the DC coefficient of the last encoded block. This calculus exploits the correlation between blocks.
Run the coefficients using the zig-zag pattern:

in order to (partially) sorter the coefficients attending to their magnitude. Notice that, after a given coefficient, the remainder ones are zero. This situation if encoded using the EOB (End Of Block) special symbol.
Encode the runs of non-zero coefficients using a variable-length code.¹

13 Interlacing of the color components

Typically, in JPEG the 4:2:0 subsamplig pattern is the most used one (see Section ??).
The color interlacing enables the pipelined (IO and CPU) reconstruction of the images row-by-row.
1. With interlacing:
2. Without interlacing:

Example

Let’s encode a block of a grayscale image (luminance of “lena”).

\(8\times 8\)-DCT


79	75	79	82	82	86	94	94
76	78	76	82	83	86	85	94
72	75	67	78	80	78	74	82
74	76	75	75	86	80	81	79
73	70	75	67	78	78	79	85
69	63	68	69	75	78	82	80
76	76	71	71	67	79	80	83
72	77	78	69	75	75	78	78

\(\Leftrightarrow \)


619	-29	8	2	1	-3	0	1
22	-6	-4	0	7	0	-2	-3
11	0	5	-4	-3	4	0	-3
2	-10	5	0	0	7	3	2
6	2	-1	-1	-2	0	0	8
1	2	1	2	0	2	-2	-2
-8	-2	-4	1	2	1	-1	1
-3	1	5	-2	1	-1	1	-3

Notice that the most part of the energy is concentrated in the DC coefficient and in the AC subrrounding coefficients.

Quantization


619	-29	8	2	1	-3	0	1
22	-6	-4	0	7	0	-2	-3
11	0	5	-4	-3	4	0	-3
2	-10	5	0	0	7	3	2
6	2	-1	-1	-2	0	0	8
1	2	1	2	0	2	-2	-2
-8	-2	-4	1	2	1	-1	1
-3	1	5	-2	1	-1	1	-3

div


16	11	10	16	24	40	51	61
12	12	14	19	26	58	60	55
14	13	16	24	40	57	69	56
14	17	22	29	51	87	80	62
18	22	37	56	68	109	103	77
24	35	55	64	81	104	113	92
49	64	78	87	103	121	120	101
72	92	95	98	112	100	103	99


39	-3	1	0	0	0	0	0
2	-1	0	0	0	0	0	0
1	0	0	0	0	0	0	0
0	-1	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0

Notice that the high frecuencies have been erased.

EOB generation

The coeficients of the matrix


39	-3	1	0	0	0	0	0
2	-1	0	0	0	0	0	0
1	0	0	0	0	0	0	0
0	-1	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0

are visited using the zig-zag scan to tind the EOB. The result is

-3

-1

EOB

14 Entropy encoding of the runs

Encoding of the DC coefficient. Substract to the DC coefficient the previous one. In the last example, we have \(39-34=5\). This value is encoded as a residue using the LS-JPEG encoding machinery (see Section 4). The result is \(100101_2\).

Encoding of the AC coefficients. We encode the pairs <number-of-previous-zeros,non-zero-value> in two steps.

Find the \(SSSS\) category correspoding to the AC coefficient. For example, if the coefficient is \(-3\), we determine that \(SSSS\leftarrow 2\) (see Section 4).

Find the entry <number-of-previous-zeros,\(SSSS\)> in the table of AC codes proposed by the JPEG:

JPEG AC codes (Luminance)

Run/category	Longitud	Base code

0/0	4	1010 (=EOB)
0/1	3	00
0/2	4	01
0/3	6	100
:	:	:
15/10	26	1111 1111 1111 11110

and output the base code bits. In our example, we output \(01_2\).

Execute the Step 3 of the LS-JPEG encoding algorithm taking \(e=\) the AC coeffient. In our example, we output \(-3-2=-4\) using a binary number of \(SSSS=2\) bits that is \(00_2\).

The whole bit-stream for our example is:

100101

0100

0110

001

000

001

11110100

1010

Finally, the block is encoded using only \(35\) bits. Therefore, the compression ratio is 15:1 approximately (\(0.55\) bits/pixel).

15 Progressive transmission

Very useful when the images are transmited over slow links.
There are three positilities:
1. Progressive transmission based on spectral selection:
  - All the low frequency coefficients are transmitted before than the rest of them.
  - Provides up to 64 scans.
2. Progressive transmission based on bit-plane selection:
  - The most significant bit-planes of all coefficients are transmitted before than the rest of them.
  - Provides up to 11 scans.
3. Progressive transmission based on a mixture of the last progressions:
  - 11 or even 64 scans could not be enough if the transmission time is big and the computational power of the received is high. In this case, the coefficients can be transmitted by bit-planes, but selecting also the coefficients attenting to their frequency.
  - Up to 704 scans.

16 The hierarchical algorithm

It is based on building a differential pyramid and after that, every level of the pyramid (residue image) is compressed using the losy encoder or the lossless encoder.
To create the pyramid we can use the following algorithm:
1. Subsample (filtering previously) the image in a factor of 2 in each dimension.
2. Interpolate the subsampled image in a factor of 2 in each dimension.
3. Substract this image to the original one, obtaining a residue image that is the base of the pyramid (the high frequencies). Notice that if we add this image and the image obtained in the Step 1, we recover the original image.
4. Repeat this process considering the subsampled image as the original one.
This multi-resolution representation is useful in those cases where the original resolution of the image is too large to the actual display/printing system.

17 Motion JPEG (M-JPEG)

When each image of a sequece is compressed idependently using JPEG, we are using a Motion JPEG compressor.
Notice that, if the quality of the compression is constant, the size of each compressed image cound be different:

Lena \(512\times 512\) RGB original

Lena at \(1.0\) bpp (\(\text {PSNR}=32.85\) dB)

800

Lena at \(0.5\) bpp (\(\text {PSNR}=30.91\) dB)

800

Lena at \(0.4\) bpp (\(\text {PSNR}=30.09\) dB)

800

Lena at \(0.3\) bpp (\(\text {PSNR}=28.97\) dB)

800

Lena at \(0.2\) bpp (\(\text {PSNR}=26.64\) dB)

800

Lena at \(0.1\) bpp (\(\text {PSNR}=21.29\) dB)

800

18 Lab

Download and compile the snr.c command line tool.
For each image of the the Image Compression Corpus, build a table with the structure:
```
     # bpp MSE
```
where the bpp (bit per pixel) is the result of compute the resulting bit-rate after compress and decompress the images using the command line tools cjpeg and djpeg.
Use gnuplot to draw the rate-distortion curve of JPEG for each of the test images.

19 Resources

[1] The Joint Photographic Experts Group (JPEG). Recommendation T.81: Digital Compression and Coding of Continuous-tone Still Images. International Telecommunication Union (ITU), September 1992.

[2] The Joint Photographic Experts Group (JPEG). FCD 14495, Lossless and Near-Lossless Coding of Continuous Tone Still Images (JPEG-LS). The International Standards Organization (ISO)/The International Telegraph and Telephone Consultative Committee (CCITT), July 1997.

[3] G. K. Wallace. The JPEG Still Picture Compression Standard. Communications of the ACM, 34(4):30 – 44, April 1991. Se puede conseguir en ftp://ftp.uu.net/graphics/jpeg/wallace.ps.Z.

¹It is possible to use Huffman and arithmetic coding. However, the marginal gain of the last one (about a 10%) and the patents that are behind it cause that the Huffman version is the most used one.


39	-3	1	0	0	0	0	0
2	-1	0	0	0	0	0	0
1	0	0	0	0	0	0	0
0	-1	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0


39	-3	1	0	0	0	0	0
2	-1	0	0	0	0	0	0
1	0	0	0	0	0	0	0
0	-1	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0


39	-3	1	0	0	0	0	0
2	-1	0	0	0	0	0	0
1	0	0	0	0	0	0	0
0	-1	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0


39	-3	1	0	0	0	0	0
2	-1	0	0	0	0	0	0
1	0	0	0	0	0	0	0
0	-1	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0