Transform Coding

Transform coding can exploit correlation in signals to concentrate its information¹ in a subset of transformed elements called coefficients, by decorrelating the input samples [4]. Normally, after the transformation, quantization [2] of the signal is more effective² when the energy of the signal is accumulated in an small number of coefficients because we can dedicate more bits to encode the more energetic ones.

2 Signal dynamic range expansion

In general, transform domains require larger dynamic ranges than the original ones.

3 Transform Coding (TC) versus Vector Quantization (VQ)

Both, TC and VQ [3] works exploiting the correlation between samples, although SQ (Scalar Quantization) does not. Therefore, we can expect that the RD performance [1] of a (TC+SQ)-based codec should perform in the RD domain similarly to VQ.

4 Matrix form of the transform

All linear³ transforms can be described as a matrix-vector product [5] \begin {equation} \mathbf {y} = \mathbf {K}\mathbf {x}, \label {eq:forward_transform_matrix_form} \end {equation} where \(\mathbf {x}\) is the input signal, \(\mathbf {K}\) is the analysis transform matrix, and \(\mathbf {y}\) is the output decomposition. The coefficients are found by \begin {equation} {\mathbf {y}}_i = \langle {\mathbf {K}}_i, {\mathbf {x}}_i\rangle , \end {equation} where \({\mathbf {K}}_i\) is the \(i\)-th row of \(\mathbf {K}\), and \(\langle \cdot ,\cdot \rangle \) denotes the inner product. This basically means that \({\mathbf {y}}_i\) is proportional to the similarity between the input signal \(\mathbf {x}\) and the taps of the filter \({\mathbf {K}}_i\).⁴ The inverse (synthesis) transform is computed by \begin {equation} \mathbf {x} = {\mathbf {K}}^{-1}\mathbf {y}, \label {eq:backward_transform_matrix_form} \end {equation} where \({\mathbf {K}}^{-1}\) denotes to the inverse matrix of \(\mathbf {K}\). When \(\mathbf K\) is orthonormal, it holds that \begin {equation} \mathbf {K}={\mathbf {K}}^{-1}={\mathbf {K}}^{\text T}, \label {eq:orthogonal_matrix} \end {equation} where \({\mathbf {K}}^{\text T}\) represents the transpose matrix of \(\mathbf {K}\). Without considering scale factors, Eq. ?? is also true for all orthogonal transforms. Orthogonal and orthonormal transforms satisfy that \begin {equation} \langle {\mathbf {K}}_i, {\mathbf {K}}_j\rangle = 0, \forall i\neq j. \end {equation}

5 Transform coding gain

Transforms are used in signal coding to provide relative (between subbands) energy compaction. The capatility of a transform to achieve this effect can be estimated by the so called transform coding gain [6, 4] defined by \begin {equation} G = \frac {\frac {1}{N}\sum _{n=1}^N{\sigma _n^2}}{(\prod _{n=1}^N\sigma _n^2)^{\frac {1}{N}}}, \end {equation} where \(N\) is the number of coefficients in a block (in our case, the number of coefficients in a transformed pixel, i.e., \(N=3\)), and \(\sigma _n^2\) is the variance of the \(n\)-th coefficient in the block. As it can be seen, \(G\) is the ratio of the arithmetic mean of the variances of the transform coefficients to their geometric mean. Notice that \(G\) is computed inside of a block (a pixel in the case of a color transform), not among blocks (pixels).

6 Block-based transform coding

Some transforms, such as the DCT are applied by 2D blocks which (for example, of \(8\times 8\) pixels). This a direct consequence of that, usually, the transform losses compaction efficiency when the block size is increased (although this depends on the signal characteristics). When the coefficients of several blocks are considered together, they form a subband, and the collection of subbands, a decomposition [7], and the index of the subband is related to the frequency of the signal. For example, in the case of the images, the position of the coefficients in the subbands is related to the spatial area where the corresponding pixels are found.

7 Some about rate-control in the transform domain

Rate-control is mainly performed through the configuration of the quantization step sizes. Notice that, in general, if the transform is orthogonal and therefore the subbands are independent, the quantization step size of a subband should be inversely proportional to the subband gain.

8 Resources

[1] V. González-Ruiz. Information Theory.

[2] V. González-Ruiz. Scalar Quantization.

[3] V. González-Ruiz. Vector Quantization.

[4] K. Sayood. Introduction to Data Compression. Morgan Kaufmann, 2017.

[5] G. Strang. Linear Algebra and Its Applications. Belmont, CA: Thomson, Brooks/Cole, 2006.

[6] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.

[7] M. Vetterli, J. Kovačević, and V.K. Goyal. Foundations of Signal Processing. Cambridge University Press, 2014.

¹That can be estimated through the variance or the entropy

²for the same bit-rate, the lossy compression ratios are higher.

³Non-linear transform are also possible, but their mathematical treatment is different.

⁴These slides can help you with this key idea.