Image transformations for compression

Vicente González Ruiz

November 18, 2021

Contents

 1 Insights
 2 Basic coding steps
  2.1 Encoder
  2.2 Decoder
 3 Splitting
 4 Transform of a block x = {xn}n=0N1
 5 A color transform
  5.1 Luminance and chrominance
  5.2 Spectral (color) redundancy
 6 Chrominance subsampling
 7 Orthogonal transform
  7.1 Orthonormal transform
 8 Signal energy
  8.1 Energy conservation
  8.2 Proof
 9 Expected value
 10 Variance
  10.1 Covariance
 11 Covariance matrix
 12 Covariance matrix of a block-based transform
 13 Correlation
 14 Autocorrelation
 15 Autocorrelation matrix
 16 Eigenvalue and eigenvector
 17 Coding gain
 18 Karhunen-Loéve transform (KLT)
 19 Discrete cosine transform (DCT))
  19.1 Properties fo DCT
 20 Dyadic discrete wavelet transform (DWT)
 21 Filters bank implementation
 22 Lifting implementation [6]
 23 T-levels 1D-DWT
 24 Subband indepency [7]
 25 Statistics of the subbands [7]
 26 Subband quantization [7]
 27 2D-DWT
 28 Haar filters [3]
 29 Linear (5/3) filters [6]
 30 Orthogonal, orthonormal, and biorthogonal transforms
 31 Quantization in the transform domain
 32 Bit-planes progression
  32.1 Bit allocation (bit-rate control)
 33 Bit allocation based on minimizing the quantization error
 34 Bit allocation based on minimizing the variance of the quantization error
 35 Encoding
 36 Code-stream orderings and scalabilities
 References

1 Insights

2 Basic coding steps

2.1 Encoder

  1. Split s into blocks of B samples, if required.
  2. Transform each block.
  3. (Optional) Quantize the coefficients.
  4. Lossless encode the quantized coefficients, performing a minimal bit-allocation.

2.2 Decoder

  1. Decode the coefficients of each block.
  2. (Optional) “Dequantize” the coefficients of each block.
  3. Inverse-transform each block.
  4. Join the blocks, if required.

3 Splitting

  1. Divide x = {xn}n=0N1 into NB blocks {x1,,xNB} of B samples.

4 Transform of a block x = {xn}n=0N1

5 A color transform

5.1 Luminance and chrominance

( Y U V ) = ( 0,299 0,587 0,144 0.14713 0.28886 0.436 0.615 0.51499 0.10001 ) ( R G B ) (3)
( R G B ) = ( 1 0 1.13983 1 0.39465 0.58060 1 2.03211 0 ) ( Y U V ) (4)

5.2 Spectral (color) redundancy

Color redundancy [124].

6 Chrominance subsampling

The human visual system is more sensitive to the luma (Y) than to the chroma (UV). This means than the chroma can be subsampled without a significant loss of quality in the images.

Chroma subsampling.

7 Orthogonal transform

7.1 Orthonormal transform

8 Signal energy

||s||2 = n=0B1s n2. (7)

8.1 Energy conservation

Orthonormal transforms are energy preserving.

8.2 Proof

||C||2 = CTC = (Ac)TAc = cTATAc = cTIc = cTc = ||c||2. (9)

Therefore, the sum of the squares of the transformed sequence is the same as the sum of the squares of the original sequence.

9 Expected value

The expected value E [X] of a random variable X, intuitively, is the long-run average value of repetitions of the experiment it represents. Let X be a random variable with a finite number of finite outcomes X1, X2, , Xn occurring with probabilities p1, p2, , pk, respectively. The expected value (or expectation) of X is defined as

E [X] = i=1nX ipi. (10)

10 Variance

The variance of a random variable X is the expected value of the squared deviation from the mean of X:

σX = Var (X) = E [(X E [X])2] = E [(X E [X])(X E [X])] = E [X2] E [X]2. (11)

10.1 Covariance

The covariance cov (X,Y ) is a measure of the joint variability of two random variables X, Y , defined as:

cov (X,Y ) = E [(X E [X])(Y E [Y ]) ], (12)

11 Covariance matrix

A covariance matrix ΣZ is a matrix whose element in the i, j position is the covariance between the i-th and j-th elements of a random vector Z (a collection of random variables Zi):

ΣZ = [ E [(Z1 μ1)(Z1 μ1)] E [(Z1 μ1)(Z2 μ2)] E [(Z1 μ1)(Zn μn)] E [(Z2 μ2)(Z1 μ1)] E [(Z2 μ2)(Z2 μ2)] E [(Z2 μ2)(Zn μn)] E [(Zn μn)(Z1 μ1)] E [(Zn μn)(Z2 μ2)] E [(Zn μn)(Zn μn)] ]= E [(Z E [Z])(Z E [Z])T] . (13)

where

μi = E (Zi), (14)

is the expected value of the i-th entry in the vector Z.

12 Covariance matrix of a block-based transform

ΣS = E [(S E [S])(S E [S])T] = E [A(s E [s])(s E [s])TAT] = AE [(s E [s])(s E [s])T] AT = AΣ sAT. (15)

13 Correlation

The most familiar measure of dependence between two random variables X and Y is the Pearson product-moment correlation coefficient, or “Pearson’s correlation coefficient”, commonly called simply “the correlation coefficient”. It is obtained by dividing the covariance of the two variables by the product of their standard deviations.

ρX,Y = corr(X,Y ) = cov(X,Y ) σXσY = E [(X μX)(Y μY )] σXσY (16)

14 Autocorrelation

Autocorrelation, also known as serial correlation, is the correlation of a signal s[t] with a delayed copy of itself as a function of delay s[t + τ].

ρs[τ] = ρs[t],s[t+τ] = E [(s[t] μ)(s[t + τ] μ)] σ2 (17)

where μ = E (s[t]) = E (s[t + τ]) and σ = σs[t] = σs[t+τ].

15 Autocorrelation matrix

The autocorrelation matrix of a random process X is the matrix [R] defined by

[R]i,j = ρX[|i j|]. (18)

16 Eigenvalue and eigenvector

In linear algebra, an eigenvector or characteristic vector v of a linear transformation T() is a non-zero vector that changes by only a scalar factor λ (known as the eigenvalue, characteristic value, or characteristic root of v) when that linear transformation is applied to it:

T(v) = λv. (19)

When T() can be expressed by a matrix X, we get

Av = λv. (20)

17 Coding gain

18 Karhunen-Loéve transform (KLT)

19 Discrete cosine transform (DCT))

19.0.1 Definition

19.1 Properties fo DCT

  1. Separable: the D-dimensional DCT can be computed using the 1D DCT in each possible dimension.
  2. In general, high energy compaction: a small number of DCT coefficients can reconstruct with a reasonable accuracy the original signal.
  3. Unitary: the energy of the DCT coefficients is proportional to the energy of the samples.
  4. Orthonormality: DCT basis are orthonormal (orthogonal + unitary) and therefore, DCT coefficients are uncorrelated.

DCT.

20 Dyadic discrete wavelet transform (DWT)

Key features:

  1. High spectral compaction, specially when transient signals are present.
  2. Multiresolution representation: it is easy to recover a reduced version of the original image if only a sub-set of the coefficients is proccesed.

21 Filters bank implementation

Where:

s = (2(L) s L) + (2(H) s H) (27)

and

L = 2(s aL) H = 2(s aH). (28)

Comments:

  1. aL and aH are the transfer function (the transfer function of a filter is the response of that filter to the unitary impulse function (Dirac’s delta)) of a low-pass filter and high-pass filter, respectively, that have been designed to be complementary (ideally, in L we only found the frequencies of s that are not in H, and viceversa). When this is true, it is said the we are using a perfect-reconstruction quadrature-mirror filter-bank and the DWT is biorthogonal.
  2. In the wavelet theory, sL is named the scale function and sH the mother function or wavelet basis function. The coefficients of L are also known as the scale coeffients and the coeffcientes of H the wavelet coefficients [5].
  3. 2() and 2() donote the subsampling and oversampling operations:
(2(s)) i = s2i (29)

and

(2(s)) i = { si2 if i if even 0 otherwise. (30)

where si if the i-th sample of s.

  1. is the convolution operator.
  2. Notice that half of the filtered samples are wasted.

22 Lifting implementation [6]

Comments:

  1. Hi = s2i+1 𝒫({s2i})i (PredictionStep)
Li = s2i + {𝒰(H)}i (UpdateStep)

  1. Subsampled signals {s2i} and {s2i+1} can been computed by using
{s2i+1} =2(Z1(s))

and

{s2i} =2(s),

where Z1 represents the one sample delay function.

  1. H has tipically less energy and variance and entropy than {s2i+1}.
  2. L has less aliasing than {s2i} (notice that L has not been low-pass filtered before subsampling it).

23 T-levels 1D-DWT

24 Subband indepency [7]

While the subbands are only independent if the input is a Gaussian random variable and the filters decorrelate the subbands (the filters are ideal), the independence assumption is ofte made because it makes he system simpler.

25 Statistics of the subbands [7]

The PDF of the coefficients of the high-frequency subbands peaks in zero and falls off very rapidly. While is it often modeled as a Laplacian distribution, it is actually falling off faster. It is more adequately fitted with a generalized Gaussian PDF with faster decay than the Laplacian PDF.

26 Subband quantization [7]

Besides the low band compression, which uses known image coding methods, the bulk of the compression is obtained by appropiate quantization of the high bands. The following quantizers are typically used:

  1. Lloyd quantizers fitted to the PDF of the particular subband.
  2. Deadzone uniform quantizers, since they tend to eliminate what is essentially noise.

Because entropy coding is used after quantization, uniform quantizers are nearly optimal. Note that VQ could be used in the subbands, but its complexity is usually not worthwhile since there is little dependence between coefficients anyway.

27 2D-DWT

E(sHb) = i|sHib|2. (31)

In the case of the 5/3-DWT, the L2-norms of the DWT subbands are:

28 Haar filters [3]

The i-th sample of the low-frequency subband is computed (using a filter plus subsampling) as

Li = s2i + s2i+1 2 , (HaarL)

and the i-th sample of the high-frequency subband as

Hi = s2i+1 s2i. (HaarH)

If Lifting is used,

Li = s2i + Hi 2 . (HaarLLifted)

Notice that Hi = 0 if s2i+1 = s2i, therefore, the Haar transform is good to encode constant signals. The notation X/Y indicates the length (taps or number of coefficients) of the low-pass and the high-pass (convolution) filters of the filter bank implementation (not Lifting), respectively.

Haar basis.

29 Linear (5/3) filters [6]

The i-th sample of the low-frequency subband (using a filter-bank implementation) is

Li = 1 8s2i2 + 1 4s2i1 + 3 4s2i + 1 4s2i+1 1 8s2i+2 (5/3L)

and the i-th sample of the high-frequency signal is computed by

Hi = s2i+1 s2i + s2i+2 2 , (5/3H)

that, if we use Lifting, it can be also computed using less operations by

Li = s2i + Hi1 + Hi 4 . (5/3LLifted)

Notice that Hi = 0 if s2i+1 = (s2i + s2i+2)2. Therefore, the 5/3 transform is suitable to encode lineally piece-wised signals.

Linear (5/3) basis.

30 Orthogonal, orthonormal, and biorthogonal transforms

In signal processing1 , a transform (such as the Discrete Cosine Transform, the Walsh-Hadamard Transform or the Karhunen-Loève Transform) is orthogonal when the coefficients generated by the transform are uncorrelated (there is no way to infer one coefficient from another).

If the norm of all the basis of an orthogonal transform is one, then the transform is said orthonormal. Orthonormal transforms are interesting because of their:

  1. Energy preservation: The energy of the output is the same than the energy of the input. This means that, for example, a quantization error produced in a coefficient of the transform will generate the same quantization error at the output (the complete signal) of the inverse transform. The same holds by the forward transform.
  2. Implementation: The transform matrix of the inverse transform is the transpose of the forward transform. In orthogonal transforms, the transform matrix of the inverse transform is the inverse of the transform matrix of the forward transform.

Biorthogonal transforms (and in particular, biorthogonal wavelets) do no satisfy any of these features: they are not energy preserving (this can be also observed because the frequency-domain responses of the analysis and synthesis filters are not symmetric), and there is not an algebraic way (matrix transposition/inversion) to compute the backward transform from the forward one, and viceversa. This, that can be considered as a drawback, gives an extra degree of freedom to design the analysis and the synthesis filters (whose only requirement is that transform pair to be reversible), providing in general the possibility of using more sofisticated filters such as those based on non-linear filtering, as for example, those that use motion estimation algorithms.

In general, each subband b of a decomposition generated by a biorthogonal 2D-DWT transform have a different subband gain αb. Usually, the lower the frequency of the subband, the higher the gain. Notice also, that these gains also are different for each transform.

Subband gains are important in lossy signal compression because they quantify the relative importance of the wavelet coefficients of the different subbands when we introduce distortion in the wavelet domain. Thus, for example, if we decide to quantize a wavelet coefficient, the amount of distortion that we are generating in the signal domain will depend on the subband where that coefficient is localized. In general, low-frequency coefficients are more “important” that high-frequency ones.

To compute the subband gains we have two options:

  1. The algebraic way. We will need the expressions of the four filters (two anaylsis filters and two synthesis filters) and deduce the gains.
  2. The algorithmic way. We will need to compute the energy of the impulse response of the inverse transform, when we apply such impulse to each one of the subbands of the decomposition. Supposing that, after appliying the DWT to an image, the coefficients of the subband HH are the least energetic with an energy x, the subband gain for subband b is computed as
    αb = Ebx, (32)

    where Eb is the energy of the reconstruction when the impulse signal is localized at b. Notice that all the gains should be larger than one.

This computation can be also applied to MCDWT, by computing the subband gains as a function of the number T of temporal decompositions.

31 Quantization in the transform domain

32 Bit-planes progression

32.1 Bit allocation (bit-rate control)

33 Bit allocation based on minimizing the quantization error

34 Bit allocation based on minimizing the variance of the quantization error

DR_model notebook.

which taking

J R(u) = 0 (38)

produces that

R(u) = 1 2log 2 (2αln 2σSu2 ) 1 2log 2λ. (R(u))

35 Encoding

36 Code-stream orderings and scalabilities

[1]   Iain Barr. Image processing with numpy. http://www.degeneratestate.org/posts/2016/Oct/23/image-processing-with-numpy/.

[2]   Emmanuelle Gouillart. Scikit-image: image processing. http://www.scipy-lectures.org/packages/scikit-image/.

[3]   Alfred Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 69(3):331–371, 1910.

[4]   Jan Erik Solem. Programming computer vision with python by. https://www.oreilly.com/library/view/programming-computer-vision/9781449341916/ch01.html.

[5]   A. Sovic and D. Sersic. Signal Decomposition Methods for Reducind Drawbacks of the DWT. Engineering Review, 32(2):70–77, 2012.

[6]   W. Sweldens and P. Schröder. Building Your Own Wavelets at Home. Wavelets in Computer Graphics, 1997.

[7]   M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.