The JPEG2000 standard (ISO/IEC 15444-1)

Vicente González Ruiz

January 21, 2023

1 JPEG 2000 features [3]
2 Intro [?]
3 The JPEG2000 algorithm
4 DC level offset (step 1/6)
5 Component decorrelation (step 2/6)
6 The 2D DWT (step 3/6)
7 Quantization (step 4/6)
8 ROI definition (step 5/6)
9 Entropy encoding (step 6/6)
10 Motion JPEG 2000
References

1 JPEG 2000 features [3]

Improved compression efficiency (compared to JPEG).
Lossy to lossless compression.
Multiple resolution representation.
Progressive (quality) decoding.
Tiling.
Region-of-interest (ROI) coding.
Error resilience.
Random code-stream access and processing.

2 Intro [?]

Lossy & lossless: The lossy path has a better RD curve than the lossless path.
Quality scalability: More decoded data, more quality.
Spatial scalability: More decoded data, more resolution.
ROI (Regions Of Interest) scalability:

3 The JPEG2000 algorithm

Color component DC level offset (optional).
Intercomponent decorrelation (optional).
Spatial decorrelation (2D-DWT).
Quantization (only in the lossy path).
ROI definition.
Entropy coding (tier-1 coding).
Bit-stream organization (tier-2 coding).

4 DC level offset (step 1/6)

Depends on the selected path. For each component:
1. Irreversible: normalize the samples $s[{\mbox {\boldmath $n$}}]$ in order to satisfy that \begin {equation} -\frac {1}{2}\le s[{\mbox {\boldmath $n$}}] \le \frac {1}{2}, \end {equation} where $s[{\mbox {\boldmath $n$}}]=s[x,y]$ is a point.
2. Reversible: substract an offset to $s[{\mbox {\boldmath $n$}}]$ if they does not verify than \begin {equation} -2^{B-1}\le s[{\mbox {\boldmath $n$}}] < 2^{B-1}, \end {equation} where $B$ is the number of bits/component.

5 Component decorrelation (step 2/6)

Only to color (RGB) images.
1. Irreversible path: \begin {equation} \begin {array}{l} \text {Y} = 0.299\text {R}+0.587\text {G}+0.144\text {B}\\ ~\\ \text {Cb} = \frac {\displaystyle 0.5}{\displaystyle 1-0.144}(\text {B}-\text {Y})\\ ~\\ \text {Cr} = \frac {\displaystyle 0.5}{\displaystyle 1-0.299}(\text {R}-\text {Y}) \end {array} \end {equation}
2. Reversible path: \begin {equation} \begin {array}{l} \text {Y\'{}} = \Big \lfloor \frac {\displaystyle \text {R}+2\text {G}+\text {B}}{\displaystyle 4}\Big \rfloor \\ ~\\ \text {Db} = \text {B}-\text {G}\\ ~\\ \text {Dr} = \text {R}-\text {G} \end {array} \end {equation}

6 The 2D DWT (step 3/6)

In JPEG 2000, the 2D DWT is applied independently to each component.
1. Irreversible path: \begin {equation} \begin {array}{c} \begin {array}{ll} \multicolumn {2}{l}{L(z) =} \\ & 0.602949018236\\ + & 0.266764118443(z^1+z^{-1})\\ - & 0.078223266529(z^2+z^{-2})\\ - & 0.016864118443(z^3+z^{-3})\\ + & 0.026748757411(z^4+z^{-4}) \end {array} \\ \\ \begin {array}{ll} \multicolumn {2}{l}{H(z) =} \\ & 0.557543526229\\ + & 0.295635881557(z^1+z^{-1})\\ - & 0.028771763114(z^2+z^{-2})\\ - & 0.045635881557(z^3+z^{-3}) \end {array} \end {array} \tag {CDF 9/7} \end {equation}
2. Reversible path: \begin {equation} \begin {array}{l} H(z) = -\frac {1}{8}(z^2+z^{-2}) + \frac {1}{4}(z^1+z^{-1}) + \frac {3}{4}\\~\\ L(z) = -\frac {1}{2}(z^1+z^{-1}) + 1 \end {array} \tag {Spline 5/3} \end {equation}

7 Quantization (step 4/6)

Irreversible path

\begin {equation} q_b = \text {sign}(y_b)\Big \lfloor \frac {\displaystyle |y_b|}{\displaystyle \Delta _b}\Big \rfloor \tag {J2KQuant} \end {equation} where $q_b$ is the quantized coefficient, $y_b\in [-0.5,0.5]$ is a wavelet coefficient in the subband $b$ and $\Delta _b$ is the quantizer step size for the subband $b$, whose value depends on $y_b$ as it is shown in the next figure (deathzone scalar quantizer):

Reversible path

There is no quantization: \begin {equation} q_b = y_b \tag {J2KRanging} \end {equation}

8 ROI definition (step 5/6)

Obtained by prioritizing (multiplying by a number greater than one) those $q_b$ that define the ROI.

9 Entropy encoding (step 6/6)

EBCOT (Embedded Block Coding with Optimal Truncation).
PCRD-opt (Post Compression Rate Distortion optimization).

EBCOT [?]

The coefficients are grouped into code-blocks (that have a typical size of $32\times 32$ or $64\times 64$) and encoded bit-plane by bit-plane, using a context-based adaptive binary arithmetic encoder (called MQ-coder).
Each bit-plane of each code-block is encoded in 3 passes:
1. Significance propagation pass: indicates if the coefficients that are expected to be significant (in absolute value larger than $2^p$, where $p$ is the index of the processed bit-plane), are significant in fact. When a coefficient becomes significant, its sign is also encoded.
2. Magnitude refinement pass: indicates the correspondent bit value for the processed bit-plane for those coefficients that, already, are significant.
3. Cleanup pass: the significance propagation only determines a subset of the total coefficients that can become significant. This pass solves this problem.
The code-stream produced after each individual pass is an optimal code-stream from the R/D point of view. In other words, if the code-stream is truncated at any of these points, we are the closest to the R/D curve as it is possible.

Notice that there are a total of \begin {equation} 3P-2 \end {equation} optimal truncation points in the code-stream of a code-block, where $P$ is the number of bit-planes in the DWT domain.

PCRD-opt

In order to provide quality scalability, the code-stream of the code-blocks should be shuffled attending to the contribution of each coding pass to the increment of quality of the reconstruction of the whole image.
A JPEG 2000 encoder typically inputs a set of $Q$ bit-rates or a number of $Q$ quality layers.
The PCRD-opt determines which segments of each code-block-stream are going to be part of each quality layer. Example:
Notice that PCRD-opt does not improve the RD curves in the sense that the curves will be closer to the origin of coordinates (in the case of using the RMSE, for example). PCRD-opt increases the number of operational RD points of the codec.

The precinct partition

Unfortunately, there is no a single code-stream ordering that generates both scalabilities: spatial and quality.
Therefore, when the data-ordering in the code-stream does not match with the target scalability, the only solution is to access to the code-stream using a non-sequential ordering. For this reason, some extra data (overhead) should be included in the code stream (remember that the contribution (in bits of code) to each code-block to the total quality can be different).
Finally, if $Q$ is high, the amount of overhead could be counterproductive.
To mitigate this drawback, the code-blocks (and their code-streams) are grouped into the so called precincts.
So, each “quality layer” of each precinct is stored in a packet and there is a index (or a length) for each packet in a JPEG 2000 code-stream.

Reduction of the distortion of a coding pass

The contribution to the quality (distortion decrease) of a coding pass to the total distortion of the reconstruction is determined by:
1. The weight of the bits of the coefficients that are encoded in the coding pass.
2. The energy gain factor of the subband where the code-block is located.

Progressions of JPEG2000

In fact, a JPEG 2000 codec produces a packet for each precinct, component, resolution level and quality layer.
Depending on the final packet ordering, we have one of the following progressions:

LRCP or quality progression

Packets are ordered first by quality, then by resolution level, then by component and finally, by precinct. Example:

RLCP or spatial progression

Packets are ordered first by resolution level, then by quality layer, then by component and finally, by precinct. Example:

PCRL or sequential progression

Packets are ordered first by precinct, then by component, after that by spatial resolution and finally, by quality layer. Example:

Rudimentary ROI definition at decoding time

The random access to the packet-stream give the possibility of the definition of a ROI by the decoder.
The accuracy of the shape of the ROI depends on the precinct size(s). The smaller the precincts, the better the precision.
This feature is typically exploited in client/server architectures through the JPIP (JPeg 2000 Interactive Protocol) [2].

“Lena” at $0.1$ bpp

JPEG ($21.29$ dB)	JPEG2000 ($27.03$ dB)

“Lena” at $0.2$ bpp

JPEG ($26.64$ dB)	JPEG2000 ($29.22$ dB)

“Lena” at $0.3$ bpp

JPEG ($28.97$ dB)	JPEG2000 ($30.71$ dB)

“Lena” at $0.4$ bpp

JPEG ($30.09$ dB)	JPEG2000 ($31.58$ dB)

“Lena” at $0.5$ bpp

JPEG ($30.91$ dB)	JPEG2000 ($32.24$ dB)

“Cat” at $0.1$ bpp

JPEG ($17.79$ dB)	JPEG2000 ($23.33$ dB)

“Cat” at $0.2$ bpp

JPEG ($23.59$ dB)	JPEG2000 ($25.97$ dB)

“Cat” at $0.3$ bpp

JPEG ($25.63$ dB)	JPEG2000 ($27.60$ dB)

“Cat” at $0.4$ bpp

JPEG ($27.10$ dB)	JPEG2000 ($28.97$ dB)

“Cat” at $0.5$ bpp

JPEG ($28.23$ dB)	JPEG2000 ($30.08$ dB)

10 Motion JPEG 2000

As in JPEG, JPEG 2000 has an extension [1] to compress sequences of images.
Each image is encoded independently.
But at difference of JPEG, the code-streams can be variable bit-rate (compression ratio selected by slope(s)) or constant bit-rate (compression ratio selected by bit-rate(s)).
Finally, scalability can be used to recover a reduced quality, lower ROI/resolution or gray-scale version of the original image.

In the III... (or Intra video) coding, the 2D block-DWT, the 2D DWT, or any other spatial transform, is used on sequences of frames (images) to exploit the spatial correlation. This is achieved by simply iterating the spatial decorrelation as it is described in the Algorithm 1 [4], where $V$ in the input sequence and $S$ controls the number of SRLs (Spatial Resolution Levels)¹. The synthesis transform is computed using the Algorithm 2. In the Fig. 1 there is an example of the decomposition generated for three frames $V_0$, $V_1$ and $V_2$.

Algorithm 1: III-coding($\mathbf {V}$ /* original video sequence */, $S$ /* Number of extra levels */) $\rightarrow $ ($\mathbf {O}$ /* transformed video sequence */)

${\mathbf O}=\{\}$ /* empty sequence */.
for ${\mathbf V}_i\in {\mathbf V}$:
1. ${\mathbf O}_i\leftarrow \text {2D-T}^{S}({\mathbf V}_i)$ /* 2D analysis spatial transform */.

Algorithm 2: III-decoding($\mathbf {O}$ /* transformed video sequence */, $S$ /* Number of extra levels */) $\rightarrow $ ($\mathbf {V}$ /* original video sequence */)

${\mathbf V}=\{\}$ /* empty sequence */.
for ${\mathbf O}_i\in {\mathbf O}$:
1. ${\mathbf V}_i\leftarrow \text {2D-T}^{-S}({\mathbf O}_i)$ /* 2D synthesis spatial transform */.

Figure 1: Decomposition generated by 1-levels ($S=1$) 2D-DWT and the 2x2-DCT (block size is equal to $S+1$).

Lab

Download and compile the Kakadu software implementation for the JPEG 2000 standard.
Find the R/D curve for the image lena using one quality layer (re-do the same experiment that in JPEG: compress and expand the image for several bit-rates and compute the RMSE). Use both, the reversible and the irreversible paths.
Now, let’s take advantage of the scalability of JPEG 2000! Compress lena without loss (using the reversible path) and one quality layer, and find the R/D curve truncating the code-stream. Re-do the experiment for different quality layers. Which alternative is best?

[1] ISO. Information Technology - JPEG 2000 Image Coding System: Motion JPEG 2000. ISO/IEC 15444-3:2007, May 2007.

[2] ITU. Information Technology - JPEG 2000 Image Coding System: Interactivity Tools, APIs and Protocols. http://www.itu.int/rec/T-REC-T.808-200501-I, 2005.

[3] Majid Rabbani, Rajan L. Joshi, and Paul W. Jones. JPEG 2000 core coding system (part 1). The JPEG 2000 Suite, pages 1–69, 2009.

[4] D.S. Taubman and W.M. Marcellin. JPEG2000. Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, 2002.

¹Notice that at least one SRL is always available for each image or video sequence