Image transformations for compression

Vicente González Ruiz

November 18, 2021

1 Insights
2 Basic coding steps
2.1 Encoder
2.2 Decoder
3 Splitting
4 Transform of a block

x = {x_{n}}_{n = 0}^{N - 1}

5 A color transform
  5.1 Luminance and chrominance
  5.2 Spectral (color) redundancy
6 Chrominance subsampling
7 Orthogonal transform
  7.1 Orthonormal transform
8 Signal energy
  8.1 Energy conservation
  8.2 Proof
9 Expected value
10 Variance
  10.1 Covariance
11 Covariance matrix
12 Covariance matrix of a block-based transform
13 Correlation
14 Autocorrelation
15 Autocorrelation matrix
16 Eigenvalue and eigenvector
17 Coding gain
18 Karhunen-Loéve transform (KLT)
19 Discrete cosine transform (DCT))
  19.1 Properties fo DCT
20 Dyadic discrete wavelet transform (DWT)
21 Filters bank implementation
22 Lifting implementation [6]
23

T

-levels 1D-DWT
24 Subband indepency [7]
25 Statistics of the subbands [7]
26 Subband quantization [7]
27 2D-DWT
28 Haar ﬁlters [3]
29 Linear (5/3) ﬁlters [6]
30 Orthogonal, orthonormal, and biorthogonal transforms
31 Quantization in the transform domain
32 Bit-planes progression
32.1 Bit allocation (bit-rate control)
33 Bit allocation based on minimizing the quantization error
34 Bit allocation based on minimizing the variance of the quantization error
35 Encoding
36 Code-stream orderings and scalabilities
References

1 Insights

Signals can be represented at least in two diﬀerent domains: the signal domain (for example, time in the case of sound of space in the case of image) and the frequency domain (in general, in a transform domain).
Transform coding is based on the idea that, in the transform domain, most of the energy of the signal can be compacted in a small number of (transform) coeﬃcients. This property can be interesting for increasing the coding eﬃciency.
Scalar quantization of decorrelated transform coeﬃcients is more eﬃcient than direct scalar quantization of the samples [7].

2 Basic coding steps

2.1 Encoder

Split $s$ into blocks of $B$ samples, if required.
Transform each block.
(Optional) Quantize the coeﬃcients.
Lossless encode the quantized coeﬃcients, performing a minimal bit-allocation.

2.2 Decoder

Decode the coeﬃcients of each block.
(Optional) “Dequantize” the coeﬃcients of each block.
Inverse-transform each block.
Join the blocks, if required.

3 Splitting

Divide $x = {x_{n}}_{n = 0}^{N - 1}$ into $⌈ N ∕ B ⌉$ blocks ${x_{1}, \dots, x_{⌈ N ∕ B ⌉}}$ of $B$ samples.

4 Transform of a block $x = {x_{n}}_{n = 0}^{N - 1}$

In the forward transform the samples of the block $x$ are projected (notice that this operation is a dot product) on a set of basis functions
$y_{n} = \sum_{i = 0}^{N - 1} a_{n, i} x_{i},$ (forward_transform)

where ${y_{n}}$ is the transformed sequence and ${x_{n}}$ is the original sequence.

The backward (inverse) transform restores the original samples

x_{n} = \sum_{i = 0}^{N - 1} b_{n, i} y_{u} .

(inverse_transform)

These equations can be written in matrix form as

y = A x = [\begin{matrix} a_{0, 0} & \dots & a_{B - 1, 0} \\ ⋮ & ⋮ \\ a_{0, B - 1} & c d o t s & a_{B - 1, B - 1} \end{matrix}] [\begin{matrix} x_{0} \\ ⋮ \\ x_{N - 1}, \end{matrix}]

(forward_transform_matrix_form)

and

x = B y,

(inverse_transform_matrix_form)

where $A$ and $B$ are matrices, being

\begin{matrix} {[A]}_{i, j} = a_{i, j} \\ {[B]}_{i, j} = b_{i, j} . \end{matrix}

(1)

${[A]}_{i}$ is called a basis vector of the transformation matrix $A$ . If the transform is orthonormal, they form an orthonormal basis set.

In transform coding, $A$ and $B$ must be inverses of each other ( $B = A^{- 1}$ ), i.e.
$A B = B A = I,$ (2)

where $I$ is the identity matrix.

5 A color transform

5.1 Luminance and chrominance

Chrominance (or chroma) is the signal used in video systems to convey the color information of the picture or a video. It was deﬁned to add the color signal to the black and white one in analog TV. Thus, the same signal, composed by two diﬀerent subsignals: Y and UV that can be isolated by ﬁltering, was compatible with both, black and white (which only used Y) and color ones (that used YUV).

(\begin{matrix} Y \\ U \\ V \end{matrix}) = (\begin{matrix} 0, 299 & 0, 587 & 0, 144 \\ - 0.14713 & - 0.28886 & 0.436 \\ 0.615 & - 0.51499 & - 0.10001 \end{matrix}) (\begin{matrix} R \\ G \\ B \end{matrix})

(3)

(\begin{matrix} R \\ G \\ B \end{matrix}) = (\begin{matrix} 1 & 0 & 1.13983 \\ 1 & - 0.39465 & - 0.58060 \\ 1 & 2.03211 & 0 \end{matrix}) (\begin{matrix} Y \\ U \\ V \end{matrix})

(4)

Later, in digital video, the YUV color domain was called the YCrCb color domain.
Used, for example, in JPEG.

5.2 Spectral (color) redundancy

$RGB$ domain is more redundant than the $YUV$ domain:

Color redundancy [1, 2, 4].

6 Chrominance subsampling

The human visual system is more sensitive to the luma (Y) than to the chroma (UV). This means than the chroma can be subsampled without a signiﬁcant loss of quality in the images.

Chroma subsampling.

7 Orthogonal transform

The rows of $A$ ( $a_{k, *}$ ) are refered to as the basis vectors of the transform, and should form an orthogonal basis set in order to provide maximum energy compactation. The rows can be also seen as the coeﬃcients of $B$ ﬁlters, being the ﬁrst one ( $i = 0$ ) the “low-pass” one, which will produce the DC coeﬃcient, and the rest ( $i \geq 1$ ) the “high-pass” ﬁlters, which will generate the AC (Alternating Current) coeﬀs. These $B$ ﬁlters form a ﬁlter-bank where the overlapping between the frequency response of the ﬁlters should be as small as possible if we want maximum energy compaction.

7.1 Orthonormal transform

If the basis vectors of a orthogonal transform are unit vectors, the transform is said orthonormal.
For orthonormal transforms, it holds that
$A^{- 1} = A^{T} .$ (5)

Therefore, the pair of transforms refered by Eqs. (forward_transform_matrix_form) and (inverse_transform_matrix_form) can be written as

$\begin{matrix} y = A x \\ x = A^{T} y . \end{matrix}$ (6)

8 Signal energy

| | s | |^{2} = \sum_{n = 0}^{B - 1} s_{n}^{2} .

(7)

8.1 Energy conservation

Orthonormal transforms are energy preserving.

If $A$ is orthonormal (also called unitary), the energy of the transformed signal is the equal to the original one:
$| | C | |^{2} = | | c | |^{2} .$ (8)

8.2 Proof

| | C | |^{2} = C^{T} C = {(A c)}^{T} A c = c^{T} A^{T} A c = c^{T} I c = c^{T} c = | | c | |^{2} .

(9)

Therefore, the sum of the squares of the transformed sequence is the same as the sum of the squares of the original sequence.

9 Expected value

The expected value $E [X]$ of a random variable $X$ , intuitively, is the long-run average value of repetitions of the experiment it represents. Let $X$ be a random variable with a ﬁnite number of ﬁnite outcomes $X_{1}$ , $X_{2}$ , $\dots$ , $X_{n}$ occurring with probabilities $p_{1}$ , $p_{2}$ , $\dots$ , $p_{k}$ , respectively. The expected value (or expectation) of $X$ is deﬁned as

E [X] = \sum_{i = 1}^{n} X_{i} p_{i} .

(10)

10 Variance

The variance of a random variable $X$ is the expected value of the squared deviation from the mean of $X$ :

σ_{X} = Var (X) = E [{(X - E [X])}^{2}] = E [(X - E [X]) (X - E [X])] = E [X^{2}] - E {[X]}^{2} .

(11)

10.1 Covariance

The covariance $cov (X, Y)$ is a measure of the joint variability of two random variables $X$ , $Y$ , deﬁned as:

cov (X, Y) = E [(X - E [X]) (Y - E [Y])],

(12)

11 Covariance matrix

A covariance matrix $Σ_{Z}$ is a matrix whose element in the $i$ , $j$ position is the covariance between the $i$ -th and $j$ -th elements of a random vector $Z$ (a collection of random variables $Z_{i}$ ):

Σ_{Z} = [\begin{matrix} E [(Z_{1} - μ_{1}) (Z_{1} - μ_{1})] & E [(Z_{1} - μ_{1}) (Z_{2} - μ_{2})] & \dots & E [(Z_{1} - μ_{1}) (Z_{n} - μ_{n})] \\ E [(Z_{2} - μ_{2}) (Z_{1} - μ_{1})] & E [(Z_{2} - μ_{2}) (Z_{2} - μ_{2})] & \dots & E [(Z_{2} - μ_{2}) (Z_{n} - μ_{n})] \\ ⋮ & ⋮ & ⋱ & ⋮ \\ E [(Z_{n} - μ_{n}) (Z_{1} - μ_{1})] & E [(Z_{n} - μ_{n}) (Z_{2} - μ_{2})] & \dots & E [(Z_{n} - μ_{n}) (Z_{n} - μ_{n})] \end{matrix}] = E [(Z - E [Z]) {(Z - E [Z])}^{T}] .

(13)

where

μ_{i} = E (Z_{i}),

(14)

is the expected value of the $i$ -th entry in the vector $Z$ .

12 Covariance matrix of a block-based transform

Σ_{S} = E [(S - E [S]) {(S - E [S])}^{T}] = E [A (s - E [s]) {(s - E [s])}^{T} A^{T}] = A E [(s - E [s]) {(s - E [s])}^{T}] A^{T} = A Σ_{s} A^{T} .

(15)

13 Correlation

The most familiar measure of dependence between two random variables $X$ and $Y$ is the Pearson product-moment correlation coeﬃcient, or “Pearson’s correlation coeﬃcient”, commonly called simply “the correlation coeﬃcient”. It is obtained by dividing the covariance of the two variables by the product of their standard deviations.

ρ_{X, Y} = c o r r (X, Y) = \frac{c o v (X, Y)}{σ_{X} σ_{Y}} = \frac{E [(X - μ_{X}) (Y - μ_{Y})]}{σ_{X} σ_{Y}}

(16)

14 Autocorrelation

Autocorrelation, also known as serial correlation, is the correlation of a signal $s [t]$ with a delayed copy of itself as a function of delay $s [t + τ]$ .

ρ_{s} [τ] = ρ_{s [t], s [t + τ]} = \frac{E [(s [t] - μ) (s [t + τ] - μ)]}{σ^{2}}

(17)

where $μ = E (s [t]) = E (s [t + τ])$ and $σ = σ_{s [t]} = σ_{s [t + τ]}$ .

15 Autocorrelation matrix

The autocorrelation matrix of a random process $X$ is the matrix $[R]$ deﬁned by

{[R]}_{i, j} = ρ_{X} [| i - j |] .

(18)

16 Eigenvalue and eigenvector

In linear algebra, an eigenvector or characteristic vector $\vec{v}$ of a linear transformation $T ()$ is a non-zero vector that changes by only a scalar factor $λ$ (known as the eigenvalue, characteristic value, or characteristic root of $\vec{v}$ ) when that linear transformation is applied to it:

T (\vec{v}) = λ \vec{v} .

(19)

When $T ()$ can be expressed by a matrix $X$ , we get

A \vec{v} = λ \vec{v} .

(20)

17 Coding gain

The coding gain measures the compaction level of the transform, which is deﬁned as
$G = \frac{\frac{1}{N} \sum_{i = 0}^{N - 1} σ_{y_{i}}^{2}}{\sqrt[N]{\prod_{i = 0}^{N - 1} σ_{y_{i}}^{2}}},$ (21)

where $σ_{y_{i}}^{2}$ is the variance of coeﬃcient $y_{i}$ .

18 Karhunen-Loéve transform (KLT)

For the KLT, the rows of $A$ (the basis of the forward transform) are the eigenvectors of the (unnormalized) autocorrelation matrix $[R]$ of the signal $s$ , where
${[R]}_{i, j} = E (s_{n} s_{n + | i - j |}) .$ (22)
It can be proven that KLT minimizes $\sqrt[B]{\prod_{u = 0}^{B - 1} σ_{C_{u}}^{2}}$ , and therefore, it provides the maximum coding gain. Unfortunately, the basis fuctions of the KLT depends on $s$ . If $S$ is non-stationary, the autocorrelation matrix (or the basis) must be sent to the decoder (to run the inverse transform) as side information. However, if $N = 2$ , the KLT is
$A_{2-KLT} = \frac{1}{\sqrt{2}} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]$ (23)

for all signals.

19 Discrete cosine transform (DCT))

19.0.1 Deﬁnition

The forward (direct) transform is
$S_{u} = \frac{\sqrt{2}}{\sqrt{N}} K (u) \sum_{n = 0}^{N - 1} s_{n} \cos \frac{(2 n + 1) π u}{2 n},$ (24)

and the backward (inverse) transform is

$s_{n} = \frac{\sqrt{2}}{\sqrt{N}} \sum_{u = 0}^{N - 1} K (u) S_{u} \cos \frac{(2 n + 1) π u}{2 n},$ (25)

where $N$ is the number of pixels, and $s_{n}$ denotes the $n$ -th pixel of the image $s$ , and

$K (u) = {\begin{matrix} \frac{1}{\sqrt{2}} & si u = 0 \\ 1 & if u > 0 . \end{matrix}$ (26)

19.1 Properties fo DCT

Separable: the $D$ -dimensional DCT can be computed using the $1$ D DCT in each possible dimension.
In general, high energy compaction: a small number of DCT coeﬃcients can reconstruct with a reasonable accuracy the original signal.
Unitary: the energy of the DCT coeﬃcients is proportional to the energy of the samples.
Orthonormality: DCT basis are orthonormal (orthogonal + unitary) and therefore, DCT coeﬃcients are uncorrelated.

DCT.

20 Dyadic discrete wavelet transform (DWT)

Key features:

High spectral compaction, specially when transient signals are present.
Multiresolution representation: it is easy to recover a reduced version of the original image if only a sub-set of the coeﬃcients is proccesed.

21 Filters bank implementation

Where:

s = (↑^{2} (L) * s_{L}) + (↑^{2} (H) * s_{H})

(27)

and

\begin{matrix} L & = & ↓^{2} (s * a_{L}) \\ H & = & ↓^{2} (s * a_{H}) . \end{matrix}

(28)

Comments:

$a_{L}$ and $a_{H}$ are the transfer function (the transfer function of a ﬁlter is the response of that ﬁlter to the unitary impulse function (Dirac’s delta)) of a low-pass ﬁlter and high-pass ﬁlter, respectively, that have been designed to be complementary (ideally, in $L$ we only found the frequencies of $s$ that are not in $H$ , and viceversa). When this is true, it is said the we are using a perfect-reconstruction quadrature-mirror ﬁlter-bank and the DWT is biorthogonal.
In the wavelet theory, $s_{L}$ is named the scale function and $s_{H}$ the mother function or wavelet basis function. The coeﬃcients of $L$ are also known as the scale coeﬃents and the coeﬀcientes of $H$ the wavelet coeﬃcients [5].
$↓^{2} (\cdot)$ and $↑^{2} (\cdot)$ donote the subsampling and oversampling operations:

{(↓^{2} (s))}_{i} = s_{2 i}

(29)

and

{(↑^{2} (s))}_{i} = {\begin{matrix} s_{i ∕ 2} & if i if even \\ 0 & otherwise . \end{matrix}

(30)

where $s_{i}$ if the $i$ -th sample of $s$ .

$*$ is the convolution operator.
Notice that half of the ﬁltered samples are wasted.

22 Lifting implementation [6]

Comments:

$H_{i} = s_{2 i + 1} - 𝒫 {({s_{2 i}})}_{i}$ (PredictionStep)

L_{i} = s_{2 i} + {𝒰 (H)}_{i}

(UpdateStep)

Subsampled signals ${s_{2 i}}$ and ${s_{2 i + 1}}$ can been computed by using

{s_{2 i + 1}} = ↓^{2} (Z^{- 1} (s))

and

{s_{2 i}} = ↓^{2} (s),

where $Z^{- 1}$ represents the one sample delay function.

$H$ has tipically less energy and variance and entropy than ${s_{2 i + 1}}$ .
$L$ has less aliasing than ${s_{2 i}}$ (notice that $L$ has not been low-pass ﬁltered before subsampling it).

23 $T$ -levels 1D-DWT

24 Subband indepency [7]

While the subbands are only independent if the input is a Gaussian random variable and the ﬁlters decorrelate the subbands (the ﬁlters are ideal), the independence assumption is ofte made because it makes he system simpler.

25 Statistics of the subbands [7]

The PDF of the coeﬃcients of the high-frequency subbands peaks in zero and falls oﬀ very rapidly. While is it often modeled as a Laplacian distribution, it is actually falling oﬀ faster. It is more adequately ﬁtted with a generalized Gaussian PDF with faster decay than the Laplacian PDF.

26 Subband quantization [7]

Besides the low band compression, which uses known image coding methods, the bulk of the compression is obtained by appropiate quantization of the high bands. The following quantizers are typically used:

Lloyd quantizers ﬁtted to the PDF of the particular subband.
Deadzone uniform quantizers, since they tend to eliminate what is essentially noise.

Because entropy coding is used after quantization, uniform quantizers are nearly optimal. Note that VQ could be used in the subbands, but its complexity is usually not worthwhile since there is little dependence between coeﬃcients anyway.

27 2D-DWT

The one-dimensional (1D) DWT is a separable transform. Therefore, the 2D DWT can be computed applying the DWT to all the rows of an image and next, to all the columns, or viceversa.

The contribution of a coeﬃcient of a subband $b$ is determined by the DWT basis fuction ${s_{H}}^{b}$ asociated to that coeﬃcient, which can be empirically determined by applying the inverse DWT to the Dirac Impulse function localized in that coeﬃcient (notice that ${s_{H}}^{b}$ does not depend on the coeﬃcient because we are supposing that all the coeﬃcients of a subband have the same contribution, the same basis fuction) [?]. Therefore, the L $2$ -norm for the subband $b$ is computed as the energy of a basis function of that subband as

E ({s_{H}}^{b}) = \sum_{i} {| {s_{H}}_{i}^{b} |}^{2} .

(31)

In the case of the 5/3-DWT, the L $2$ -norms of the DWT subbands are:

28 Haar ﬁlters [3]

The $i$ -th sample of the low-frequency subband is computed (using a ﬁlter plus subsampling) as

L_{i} = \frac{s_{2 i} + s_{2 i + 1}}{2},

(HaarL)

and the $i$ -th sample of the high-frequency subband as

H_{i} = s_{2 i + 1} - s_{2 i} .

(HaarH)

If Lifting is used,

L_{i} = s_{2 i} + \frac{H_{i}}{2} .

(HaarLLifted)

Notice that $H_{i} = 0$ if $s_{2 i + 1} = s_{2 i}$ , therefore, the Haar transform is good to encode constant signals. The notation X/Y indicates the length (taps or number of coeﬃcients) of the low-pass and the high-pass (convolution) ﬁlters of the ﬁlter bank implementation (not Lifting), respectively.

Haar basis.

29 Linear (5/3) ﬁlters [6]

The $i$ -th sample of the low-frequency subband (using a ﬁlter-bank implementation) is

L_{i} = - \frac{1}{8} s_{2 i - 2} + \frac{1}{4} s_{2 i - 1} + \frac{3}{4} s_{2 i} + \frac{1}{4} s_{2 i + 1} - \frac{1}{8} s_{2 i + 2}

(5/3L)

and the $i$ -th sample of the high-frequency signal is computed by

H_{i} = s_{2 i + 1} - \frac{s_{2 i} + s_{2 i + 2}}{2},

(5/3H)

that, if we use Lifting, it can be also computed using less operations by

L_{i} = s_{2 i} + \frac{H_{i - 1} + H_{i}}{4} .

(5/3LLifted)

Notice that $H_{i} = 0$ if $s_{2 i + 1} = (s_{2 i} + s_{2 i + 2}) ∕ 2$ . Therefore, the 5/3 transform is suitable to encode lineally piece-wised signals.

Linear (5/3) basis.

30 Orthogonal, orthonormal, and biorthogonal transforms

In signal processing¹ , a transform (such as the Discrete Cosine Transform, the Walsh-Hadamard Transform or the Karhunen-Loève Transform) is orthogonal when the coeﬃcients generated by the transform are uncorrelated (there is no way to infer one coeﬃcient from another).

If the norm of all the basis of an orthogonal transform is one, then the transform is said orthonormal. Orthonormal transforms are interesting because of their:

Energy preservation: The energy of the output is the same than the energy of the input. This means that, for example, a quantization error produced in a coeﬃcient of the transform will generate the same quantization error at the output (the complete signal) of the inverse transform. The same holds by the forward transform.
Implementation: The transform matrix of the inverse transform is the transpose of the forward transform. In orthogonal transforms, the transform matrix of the inverse transform is the inverse of the transform matrix of the forward transform.

Biorthogonal transforms (and in particular, biorthogonal wavelets) do no satisfy any of these features: they are not energy preserving (this can be also observed because the frequency-domain responses of the analysis and synthesis ﬁlters are not symmetric), and there is not an algebraic way (matrix transposition/inversion) to compute the backward transform from the forward one, and viceversa. This, that can be considered as a drawback, gives an extra degree of freedom to design the analysis and the synthesis ﬁlters (whose only requirement is that transform pair to be reversible), providing in general the possibility of using more soﬁsticated ﬁlters such as those based on non-linear ﬁltering, as for example, those that use motion estimation algorithms.

In general, each subband $b$ of a decomposition generated by a biorthogonal 2D-DWT transform have a diﬀerent subband gain $α_{b}$ . Usually, the lower the frequency of the subband, the higher the gain. Notice also, that these gains also are diﬀerent for each transform.

Subband gains are important in lossy signal compression because they quantify the relative importance of the wavelet coeﬃcients of the diﬀerent subbands when we introduce distortion in the wavelet domain. Thus, for example, if we decide to quantize a wavelet coeﬃcient, the amount of distortion that we are generating in the signal domain will depend on the subband where that coeﬃcient is localized. In general, low-frequency coeﬃcients are more “important” that high-frequency ones.

To compute the subband gains we have two options:

The algebraic way. We will need the expressions of the four ﬁlters (two anaylsis ﬁlters and two synthesis ﬁlters) and deduce the gains.
The algorithmic way. We will need to compute the energy of the impulse response of the inverse transform, when we apply such impulse to each one of the subbands of the decomposition. Supposing that, after appliying the DWT to an image, the coeﬃcients of the subband HH are the least energetic with an energy $x$ , the subband gain for subband $b$ is computed as
$α_{b} = E_{b} ∕ x,$ (32)

where $E_{b}$ is the energy of the reconstruction when the impulse signal is localized at $b$ . Notice that all the gains should be larger than one.

This computation can be also applied to MCDWT, by computing the subband gains as a function of the number $T$ of temporal decompositions.

31 Quantization in the transform domain

If the transform is orthogonal, by deﬁnition coeﬀs $S [k]$ are uncorrelated. Therefore, a scalar quantizer can performs an optimal quantization.

32 Bit-planes progression

32.1 Bit allocation (bit-rate control)

In lossless coding, coeﬀs $S_{u}$ are directly encoded using some text compression algorithm or a combination of them.
However, in most situations, a lossy compression is needed and in this case, a transform coder must determine, given a maximum number of bits $\bar{R}$ (which is deﬁned by the compression ratio selected by the user), the number of bits $R (u)$ used by the quantizer for each coeﬀ $S_{k}$ .

33 Bit allocation based on minimizing the quantization error

In unitary transforms, as a consequence of the energy preserving property, an uniform quantization (i.e. the dividing each coeﬀ $S_{u}$ by the same quantization step) should provide optimal bit allocation if we want to minimize the quantization error (the distortion) in the recostructed signal $s$ .

34 Bit allocation based on minimizing the variance of the quantization error

Lets assume that the variance of the coeﬀs, deﬁned as

σ_{S_{u}}^{2} = E ({(S_{u} - \bar{S})}^{2})

(33)

(where

\bar{S} = E (S) = \frac{1}{B} \sum_{u = 0}^{B - 1} S_{u}

(34)

corresponds to the amount of information provided by each coeﬀ. Therefore, coeﬀs with high variance should be assigned more bits and viceversa.

Lets deﬁne
$\bar{R} = \frac{1}{B} \sum_{u = 0}^{B - 1} R (u)$ ( $\bar{R}$ )

as the (target) average number of bits/coeﬀ, where $R (u)$ is the number of bits assigned to coeﬀ $S_{u}$ .
If the mean square error is as a measure of distortion, the variance of the distortion generated by the quantization of a coeﬀ $S_{u}$ can be modeled by
$σ_{S_{u} - {\tilde{S}}_{u}}^{2} = α_{u} 2^{- 2 R (u)} σ_{S_{u}}^{2},$ (35)

where $α_{u}$ depends on the frequency $u$ and the quantizer.

DR_model notebook.

Assuming an additive distorion metric, the total distortion variance for

R_{k}

bits/coeﬀ is given by

D = σ_{S - \tilde{S}}^{2} = \sum_{u = 0}^{B - 1} σ_{S_{u} - {\tilde{S}}_{u}}^{2} = \sum_{u = 0}^{B - 1} α_{u} 2^{- 2 R (u)} σ_{S_{u}}^{2} = α \sum_{u = 0}^{B - 1} 2^{- 2 R (u)} σ_{S_{u}}^{2}

(

D

)

supposing that $α_{u} = α$ is constant for all coeﬀs (a valid supposition for unitary transforms because the quantization error generated in each coeﬀ should be the same if an uniform quantizer is used).

The objective of the bit-allocation process is to ﬁnd the ${R (u)}_{u = 0}^{B - 1}$ so that minimize $D$ subject to constraint $\bar{R}$ :
$\underset{{R (u)}_{u = 0}^{B - 1}}{argmin} D, s.t. \bar{R} .$ (36)
This is an optimization problem that can be solved using Lagrange multipliers (note: the following development is not the “standard” way of using Lagrenge multipliear, but it is equivalent).
Lets deﬁne the Lagrangian functional
$J = D - λ (\bar{R} - \frac{1}{B} \sum_{u = 0}^{B - 1} R (u)) = α \sum_{u = 0}^{B - 1} 2^{- 2 R (u)} σ_{S_{u}}^{2} - λ (\bar{R} - \frac{1}{B} \sum_{u = 0}^{B - 1} R (u)),$ (37)

which taking

\frac{\partial J}{\partial R (u)} = 0

(38)

produces that

R (u) = \frac{1}{2} \log_{2} (2 α \ln 2 σ_{S_{u}}^{2}) - \frac{1}{2} \log_{2} λ .

(

R (u)

)

Substituting

R (u)

in Eq. (

\bar{R}

), we get that

\bar{R} = \frac{1}{B} \sum_{u = 0}^{B - 1} \frac{1}{2} \log_{2} (2 α \ln 2 σ_{S_{u}}^{2}) - \frac{1}{2} \log_{2} λ .

(39)

Operating
$λ = \prod_{u = 0}^{B - 1} \sqrt[B]{2 α \ln 2 σ_{S_{u}}^{2}} - 2^{- 2 \bar{R}} .$ (40)
Substituting $λ$ in Eq. ( $R (u)$ ), we obtain the optimal number of bits for each coeﬀ
$R (u) = \bar{R} + \frac{1}{2} \log_{2} \frac{σ_{S_{u}}^{2}}{\prod_{u = 0}^{B - 1} \sqrt[B]{σ_{S_{u}}^{2}}} .$ (41)

which minimizes the variance of the quantization error. Notice that this value depends proportionally on $\bar{R}$ (the target average bits/coeﬀ), logaritmically on $σ_{S_{u}^{2}}$ (the variance of the coeﬀ) and log-inversely on the geometric mean of the variances of all coeﬀs.

35 Encoding

For DCT, usually ZigZag-RLE followed by 0-order entropy coding.
For DWT, tree coding or block-based coding.

36 Code-stream orderings and scalabilities

The order in which the DWT coeﬃcients are decoded determines the type of scalability (example with 2 qualities and 3 resolutions):

[1] Iain Barr. Image processing with numpy. http://www.degeneratestate.org/posts/2016/Oct/23/image-processing-with-numpy/.

[2] Emmanuelle Gouillart. Scikit-image: image processing. http://www.scipy-lectures.org/packages/scikit-image/.

[3] Alfred Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 69(3):331–371, 1910.

[4] Jan Erik Solem. Programming computer vision with python by. https://www.oreilly.com/library/view/programming-computer-vision/9781449341916/ch01.html.

[5] A. Sovic and D. Sersic. Signal Decomposition Methods for Reducind Drawbacks of the DWT. Engineering Review, 32(2):70–77, 2012.

[6] W. Sweldens and P. Schröder. Building Your Own Wavelets at Home. Wavelets in Computer Graphics, 1997.

[7] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.