Image transformations for compression
Vicente González Ruiz
November 18, 2021
Contents
1 Insights
- Signals can be represented at least in two different domains: the signal
domain (for example, time in the case of sound of space in the case of
image) and the frequency domain (in general, in a transform domain).
- Transform coding is based on the idea that, in the transform domain,
most of the energy of the signal can be compacted in a small number of
(transform) coefficients. This property can be interesting for increasing
the coding efficiency.
- Scalar quantization of decorrelated transform coefficients is more efficient
than direct scalar quantization of the samples [7].
2 Basic coding steps
2.1 Encoder
- Split
into blocks of
samples, if required.
- Transform each block.
- (Optional) Quantize the coefficients.
- Lossless encode the quantized coefficients, performing a minimal
bit-allocation.
2.2 Decoder
- Decode the coefficients of each block.
- (Optional) “Dequantize” the coefficients of each block.
- Inverse-transform each block.
- Join the blocks, if required.
3 Splitting
- Divide
into
blocks
of
samples.
4 Transform of a block
- In the forward transform the samples of the block
are projected (notice that this operation is a dot product) on a set of basis
functions
| (forward_transform) |
where is the transformed
sequence and
is the original sequence.
- The backward (inverse) transform restores the original samples
| (inverse_transform) |
- These equations can be written in matrix form as
| (forward_transform_matrix_form) |
and
| (inverse_transform_matrix_form) |
where
and
are matrices, being
| (1) |
is called a basis vector of the transformation matrix
.
If the transform is orthonormal, they form an orthonormal basis set.
- In transform coding,
and must be inverses
of each other (),
i.e.
where
is the identity matrix.
5 A color transform
5.1 Luminance and chrominance
- Chrominance (or chroma) is the signal used in video systems to convey the
color information of the picture or a video. It was defined to add the color
signal to the black and white one in analog TV. Thus, the same signal,
composed by two different subsignals: Y and UV that can be isolated by
filtering, was compatible with both, black and white (which only used Y)
and color ones (that used YUV).
| (3) |
| (4) |
5.2 Spectral (color) redundancy
Color redundancy [1, 2, 4].
6 Chrominance subsampling
The human visual system is more sensitive to the luma (Y) than to the chroma (UV).
This means than the chroma can be subsampled without a significant loss of quality
in the images.
Chroma subsampling.
7 Orthogonal transform
- The rows of
()
are refered to as the basis vectors of the transform, and should form an
orthogonal basis set in order to provide maximum energy compactation.
The rows can be also seen as the coefficients of
filters, being the first one ()
the “low-pass” one, which will produce the DC coefficient, and the rest
()
the “high-pass” filters, which will generate the AC (Alternating Current)
coeffs. These
filters form a filter-bank where the overlapping between the frequency
response of the filters should be as small as possible if we want maximum
energy compaction.
7.1 Orthonormal transform
8 Signal energy
| (7) |
8.1 Energy conservation
Orthonormal transforms are energy preserving.
- If
is orthonormal (also called unitary), the energy of the transformed signal
is the equal to the original one:
8.2 Proof
| (9) |
Therefore, the sum of the squares of the transformed sequence is the same as the sum
of the squares of the original sequence.
The expected value
of a random variable ,
intuitively, is the long-run average value of repetitions of the experiment it represents.
Let
be a random variable with a finite number of finite outcomes
,
,
,
occurring with
probabilities ,
,
,
,
respectively. The expected value (or expectation) of
is
defined as
| (10) |
The variance of a random variable
is the expected value of the squared deviation from the mean of
:
| (11) |
The covariance
is a measure of the joint variability of two random variables
,
,
defined as:
| (12) |
A covariance matrix is a
matrix whose element in the ,
position is the covariance
between the -th and
-th elements of a random
vector (a collection
of random variables ):
| (13) |
where
| (14) |
is the expected value of the -th
entry in the vector .
12 Covariance matrix of a block-based transform
| (15) |
The most familiar measure of dependence between two random variables
and
is the
Pearson product-moment correlation coefficient, or “Pearson’s correlation
coefficient”, commonly called simply “the correlation coefficient”. It is obtained by
dividing the covariance of the two variables by the product of their standard
deviations.
| (16) |
Autocorrelation, also known as serial correlation, is the correlation of a signal
with a delayed copy of itself as a function of delay
.
| (17) |
where
and .
15 Autocorrelation matrix
The autocorrelation matrix of a random process
is the
matrix
defined by
| (18) |
In linear algebra, an eigenvector or characteristic vector
of a linear
transformation
is a non-zero vector that changes by only a scalar factor
(known as the eigenvalue, characteristic value, or characteristic root of
)
when that linear transformation is applied to it:
| (19) |
When can be
expressed by a matrix ,
we get
| (20) |
17 Coding gain
18 Karhunen-Loéve transform (KLT)
- For the KLT, the rows of
(the basis of the forward transform) are the eigenvectors of the (unnormalized)
autocorrelation matrix
of the signal ,
where
| (22) |
- It can be proven that KLT minimizes
,
and therefore, it provides the maximum coding gain.
Unfortunately, the basis fuctions of the KLT depends on
. If
is non-stationary, the autocorrelation matrix (or the basis) must be sent to the
decoder (to run the inverse transform) as side information. However, if
,
the KLT is
| (23) |
for all signals.
19 Discrete cosine transform (DCT))
19.0.1 Definition
- The forward (direct) transform is
| (24) |
and the backward (inverse) transform is
| (25) |
where is the number
of pixels, and
denotes the -th
pixel of the image ,
and
| (26) |
19.1 Properties fo DCT
- Separable: the -dimensional
DCT can be computed using the D
DCT in each possible dimension.
- In general, high energy compaction: a small number of DCT coefficients
can reconstruct with a reasonable accuracy the original signal.
- Unitary: the energy of the DCT coefficients is proportional to the energy
of the samples.
- Orthonormality: DCT basis are orthonormal (orthogonal + unitary)
and therefore, DCT coefficients are uncorrelated.
DCT.
20 Dyadic discrete wavelet transform (DWT)
Key features:
- High spectral compaction, specially when transient signals are present.
- Multiresolution representation: it is easy to recover a reduced version
of the original image if only a sub-set of the coefficients is proccesed.
21 Filters bank implementation
Where:
| (27) |
and
| (28) |
Comments:
-
and
are the transfer function (the transfer function of a filter is the response
of that filter to the unitary impulse function (Dirac’s delta)) of a low-pass
filter and high-pass filter, respectively, that have been designed to be
complementary (ideally, in
we only found the frequencies of
that are not in ,
and viceversa). When this is true, it is said the we are using a perfect-reconstruction
quadrature-mirror filter-bank and the DWT is biorthogonal.
- In the wavelet theory,
is named the scale function and
the mother function or wavelet basis function. The coefficients of
are also known as the scale coeffients and the coeffcientes of
the wavelet coefficients [5].
-
and
donote the subsampling and oversampling operations:
and
| (30) |
where
if the -th
sample of .
-
is the convolution operator.
- Notice that half of the filtered samples are wasted.
22 Lifting implementation [6]
Comments:
| (PredictionStep) |
| (UpdateStep) |
- Subsampled signals
and
can been computed by using
and
where
represents the one sample delay function.
-
has tipically less energy and variance and entropy than .
-
has less aliasing than
(notice that
has not been low-pass filtered before subsampling it).
23 -levels
1D-DWT
24 Subband indepency [7]
While the subbands are only independent if the input is a Gaussian random variable
and the filters decorrelate the subbands (the filters are ideal), the independence
assumption is ofte made because it makes he system simpler.
25 Statistics of the subbands [7]
The PDF of the coefficients of the high-frequency subbands peaks in zero and falls off
very rapidly. While is it often modeled as a Laplacian distribution, it is actually
falling off faster. It is more adequately fitted with a generalized Gaussian PDF with
faster decay than the Laplacian PDF.
26 Subband quantization [7]
Besides the low band compression, which uses known image coding methods, the bulk
of the compression is obtained by appropiate quantization of the high bands. The
following quantizers are typically used:
- Lloyd quantizers fitted to the PDF of the particular subband.
- Deadzone uniform quantizers, since they tend to eliminate what is
essentially noise.
Because entropy coding is used after quantization, uniform quantizers are nearly
optimal. Note that VQ could be used in the subbands, but its complexity is
usually not worthwhile since there is little dependence between coefficients
anyway.
27 2D-DWT
- The one-dimensional (1D) DWT is a separable transform. Therefore, the
2D DWT can be computed applying the DWT to all the rows of an image
and next, to all the columns, or viceversa.
- The contribution of a coefficient of a subband
is determined by the DWT basis fuction
asociated to that coefficient, which can be empirically determined by
applying the inverse DWT to the Dirac Impulse function localized in that
coefficient (notice that
does not depend on the coefficient because we are supposing that all
the coefficients of a subband have the same contribution, the same basis
fuction) [?]. Therefore, the L-norm
for the subband
is computed as the energy of a basis function of that subband as
| (31) |
In the case of the 5/3-DWT, the
L-norms
of the DWT subbands are:
28 Haar filters [3]
The -th
sample of the low-frequency subband is computed (using a filter plus subsampling)
as
| (HaarL) |
and the -th
sample of the high-frequency subband as
| (HaarH) |
If Lifting is used,
| (HaarLLifted) |
Notice that
if ,
therefore, the Haar transform is good to encode constant signals. The notation X/Y
indicates the length (taps or number of coefficients) of the low-pass and the
high-pass (convolution) filters of the filter bank implementation (not Lifting),
respectively.
Haar basis.
29 Linear (5/3) filters [6]
The -th
sample of the low-frequency subband (using a filter-bank implementation)
is
| (5/3L) |
and the -th
sample of the high-frequency signal is computed by
| (5/3H) |
that, if we use Lifting, it can be also computed using less operations
by
| (5/3LLifted) |
Notice that
if .
Therefore, the 5/3 transform is suitable to encode lineally piece-wised signals.
Linear (5/3) basis.
30 Orthogonal, orthonormal, and biorthogonal transforms
In signal processing ,
a transform (such as the Discrete Cosine Transform, the Walsh-Hadamard Transform
or the Karhunen-Loève Transform) is orthogonal when the coefficients generated by
the transform are uncorrelated (there is no way to infer one coefficient from
another).
If the norm of all the basis of an orthogonal transform is one, then the
transform is said orthonormal. Orthonormal transforms are interesting because of
their:
- Energy preservation: The energy of the output is the same than
the energy of the input. This means that, for example, a quantization
error produced in a coefficient of the transform will generate the same
quantization error at the output (the complete signal) of the inverse
transform. The same holds by the forward transform.
- Implementation: The transform matrix of the inverse transform is
the transpose of the forward transform. In orthogonal transforms, the
transform matrix of the inverse transform is the inverse of the transform
matrix of the forward transform.
Biorthogonal transforms (and in particular, biorthogonal wavelets) do
no satisfy any of these features: they are not energy preserving (this can
be also observed because the frequency-domain responses of the analysis
and synthesis filters are not symmetric), and there is not an algebraic way
(matrix transposition/inversion) to compute the backward transform from the
forward one, and viceversa. This, that can be considered as a drawback, gives
an extra degree of freedom to design the analysis and the synthesis filters
(whose only requirement is that transform pair to be reversible), providing in
general the possibility of using more sofisticated filters such as those based
on non-linear filtering, as for example, those that use motion estimation
algorithms.
In general, each subband
of a decomposition generated by a biorthogonal 2D-DWT transform have a different subband
gain .
Usually, the lower the frequency of the subband, the higher the gain. Notice also,
that these gains also are different for each transform.
Subband gains are important in lossy signal compression because they quantify
the relative importance of the wavelet coefficients of the different subbands when we
introduce distortion in the wavelet domain. Thus, for example, if we decide to
quantize a wavelet coefficient, the amount of distortion that we are generating in the
signal domain will depend on the subband where that coefficient is localized. In
general, low-frequency coefficients are more “important” that high-frequency
ones.
To compute the subband gains we have two options:
- The algebraic way. We will need the expressions of the four filters (two
anaylsis filters and two synthesis filters) and deduce the gains.
- The algorithmic way. We will need to compute the energy of the impulse response
of the inverse transform, when we apply such impulse to each one of the
subbands of the decomposition. Supposing that, after appliying the DWT to an
image, the coefficients of the subband HH are the least energetic with an energy
, the subband
gain for subband
is computed as
where
is the energy of the reconstruction when the impulse signal is localized at
.
Notice that all the gains should be larger than one.
This computation can be also applied to MCDWT, by computing the subband gains as a function
of the number
of temporal decompositions.
31 Quantization in the transform domain
- If the transform is orthogonal, by definition coeffs
are uncorrelated. Therefore, a scalar quantizer can performs an optimal
quantization.
32 Bit-planes progression
32.1 Bit allocation (bit-rate control)
- In lossless coding, coeffs
are directly encoded using some text compression algorithm or a combination
of them.
- However, in most situations, a lossy compression is needed and in this
case, a transform coder must determine, given a maximum number of bits
(which is defined by the compression ratio selected by the user), the
number of bits
used by the quantizer for each coeff .
33 Bit allocation based on minimizing the quantization error
- In unitary transforms, as a consequence of the energy preserving property,
an uniform quantization (i.e. the dividing each coeff
by the same quantization step) should provide optimal bit allocation if we
want to minimize the quantization error (the distortion) in the recostructed
signal .
34 Bit allocation based on minimizing the variance of the quantization
error
- Lets assume that the variance of the coeffs, defined as
| (33) |
(where
| (34) |
corresponds to the amount of information provided by each coeff. Therefore,
coeffs with high variance should be assigned more bits and viceversa.
- Lets define
| () |
as the (target) average number of bits/coeff, where
is the number of bits
assigned to coeff .
- If the mean square error is as a measure of distortion, the
variance of the distortion generated by the quantization of a coeff
can be modeled by
| (35) |
where depends
on the frequency
and the quantizer.
DR_model notebook.
which taking
produces that
| () |
35 Encoding
- For DCT, usually ZigZag-RLE followed by 0-order entropy coding.
- For DWT, tree coding or block-based coding.
36 Code-stream orderings and scalabilities
- The order in which the DWT coefficients are decoded determines the type
of scalability (example with 2 qualities and 3 resolutions):