Transform coding can exploit correlation in signals to concentrate its information1 in a subset of transformed elements called coefficients, by decorrelating the input samples [4]. Normally, after the transformation, quantization [2] of the signal is more effective2 when the energy of the signal is accumulated in an small number of coefficients because we can dedicate more bits to encode the more energetic ones.
In general, transform domains require larger dynamic ranges than the original ones.
Both, TC and VQ [3] works exploiting the correlation between samples, although SQ (Scalar Quantization) does not. Therefore, we can expect that the RD performance [1] of a (TC+SQ)-based codec should perform in the RD domain similarly to VQ.
All linear3 transforms can be described as a matrix-vector product [5] \begin {equation} \mathbf {y} = \mathbf {K}\mathbf {x}, \label {eq:forward_transform_matrix_form} \end {equation} where \(\mathbf {x}\) is the input signal, \(\mathbf {K}\) is the analysis transform matrix, and \(\mathbf {y}\) is the output decomposition. The coefficients are found by \begin {equation} {\mathbf {y}}_i = \langle {\mathbf {K}}_i, {\mathbf {x}}_i\rangle , \end {equation} where \({\mathbf {K}}_i\) is the \(i\)-th row of \(\mathbf {K}\), and \(\langle \cdot ,\cdot \rangle \) denotes the inner product. This basically means that \({\mathbf {y}}_i\) is proportional to the similarity between the input signal \(\mathbf {x}\) and the taps of the filter \({\mathbf {K}}_i\).4 The inverse (synthesis) transform is computed by \begin {equation} \mathbf {x} = {\mathbf {K}}^{-1}\mathbf {y}, \label {eq:backward_transform_matrix_form} \end {equation} where \({\mathbf {K}}^{-1}\) denotes to the inverse matrix of \(\mathbf {K}\). When \(\mathbf K\) is orthonormal, it holds that \begin {equation} \mathbf {K}={\mathbf {K}}^{-1}={\mathbf {K}}^{\text T}, \label {eq:orthogonal_matrix} \end {equation} where \({\mathbf {K}}^{\text T}\) represents the transpose matrix of \(\mathbf {K}\). Without considering scale factors, Eq. ?? is also true for all orthogonal transforms. Orthogonal and orthonormal transforms satisfy that \begin {equation} \langle {\mathbf {K}}_i, {\mathbf {K}}_j\rangle = 0, \forall i\neq j. \end {equation}
Transforms are used in signal coding to provide relative (between subbands) energy compaction. The capatility of a transform to achieve this effect can be estimated by the so called transform coding gain [6, 4] defined by \begin {equation} G = \frac {\frac {1}{N}\sum _{n=1}^N{\sigma _n^2}}{(\prod _{n=1}^N\sigma _n^2)^{\frac {1}{N}}}, \end {equation} where \(N\) is the number of coefficients in a block (in our case, the number of coefficients in a transformed pixel, i.e., \(N=3\)), and \(\sigma _n^2\) is the variance of the \(n\)-th coefficient in the block. As it can be seen, \(G\) is the ratio of the arithmetic mean of the variances of the transform coefficients to their geometric mean. Notice that \(G\) is computed inside of a block (a pixel in the case of a color transform), not among blocks (pixels).
Some transforms, such as the DCT are applied by 2D blocks which (for example, of \(8\times 8\) pixels). This a direct consequence of that, usually, the transform losses compaction efficiency when the block size is increased (although this depends on the signal characteristics). When the coefficients of several blocks are considered together, they form a subband, and the collection of subbands, a decomposition [7], and the index of the subband is related to the frequency of the signal. For example, in the case of the images, the position of the coefficients in the subbands is related to the spatial area where the corresponding pixels are found.
Rate-control is mainly performed through the configuration of the quantization step sizes. Notice that, in general, if the transform is orthogonal and therefore the subbands are independent, the quantization step size of a subband should be inversely proportional to the subband gain.
[1] V. González-Ruiz. Information Theory.
[2] V. González-Ruiz. Scalar Quantization.
[3] V. González-Ruiz. Vector Quantization.
[4] K. Sayood. Introduction to Data Compression. Morgan Kaufmann, 2017.
[5] G. Strang. Linear Algebra and Its Applications. Belmont, CA: Thomson, Brooks/Cole, 2006.
[6] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.
[7] M. Vetterli, J. Kovačević, and V.K. Goyal. Foundations of Signal Processing. Cambridge University Press, 2014.