Block-Based DCT (Discrete Cosine Transform)

In the 1D case, the forward DCT for a digital signal $\mathbf {g}_n$ of length $N$ is defined as [1] \begin {equation} {\mathbf h}_u = \sqrt {\frac {2}{N}}\sum _{n=0}^{N-1}{\mathbf g}_n{\mathbf c}_u\cos \Big (\pi \frac {u(2x+1)}{2N}\Big ), \end {equation} for $0<u<N$, and the inverse transform is \begin {equation} {\mathbf g}_n = \sqrt {\frac {2}{N}}\sum _{u=0}^{N-1}{\mathbf h}_u{\mathbf c}_u\cos \Big (\pi \frac {u(2x+1)}{2N}\Big ), \end {equation} for $0<n<N$, with \begin {equation} {\mathbf c}_u = \left \{ \begin {array}{ll} \frac {1}{\sqrt {2}} & \quad \text {for}~u=0, \\ 1 & \quad \text {otherwise}. \end {array} \right . \end {equation}

The transformed signal is a sequence of coefficients ${\mathbf h}_u$ with the same length than the original signal ${\mathbf g}_n$, and the position of the coefficients in the transform domain indicate the contribution of such coefficient to the corresponding (in this case, spatial) frequency. For example, the coefficient at the position 0 (that is commonly refered as DC (Direct Current)) is equal to the average of the signal. The rest of high-frequency coefficients are called AC (Alternating Current) coefficients.

The DCT can be also expressed in matrix [3] form as \begin {equation} {\mathbf h} = {\mathbf K}{\mathbf g}, \end {equation} where $\mathbf K$ is the forward transform matrix. The rows of the transform matrix are often referred to as the basis vectors for the transform because they form an orthonormal basis set (see these slides), and the elements of the transformed sequence are often called the transform coefficients. Obviously, the inverse transform is computed by \begin {equation} {\mathbf g} = {\mathbf K}^{-1}{\mathbf h}, \end {equation} where ${\mathbf K}^{-1}$ denotes the inverse matrix of $\mathbf K$. In the case of the DCT, $\mathbf K$ is orthogonal and therefore, ${\mathbf K}^{-1}={\mathbf K}^{\text T}$, where $\cdot ^{\text T}$ denotes the transpose of $\mathbf K$.

2 3-Components 1D-DCT

The three components DCT (3C-DCT) is a transform applied to a vector of $3$ elements (in our case, a $\text {RGB}$ pixel) defined by \begin {equation} \begin {bmatrix} \text {DCT}^0 \\ \text {DCT}^1 \\ \text {DCT}^2 \end {bmatrix} = \begin {bmatrix} 0.57735027 & 0.70710678 & 0.40824829 \\ 0.57735027 & 0.0 & -0.81649658 \\ 0.57735027 & -0.70710678 & 0.40824829 \end {bmatrix} \begin {bmatrix} \text {R} \\ \text {G} \\ \text {B} \end {bmatrix}, \end {equation} and the inverse transform by \begin {equation} \begin {bmatrix} \text {R} \\ \text {G} \\ \text {B} \end {bmatrix} = \begin {bmatrix} 0.57735027 & 0.57735027 & 0.57735027 \\ 0.70710678 & 0.0 & -0.70710678 \\ 0.40824829 & -0.81649658 & 0.40824829 \end {bmatrix} \begin {bmatrix} \text {DCT}^0 \\ \text {DCT}^1 \\ \text {DCT}^2 \end {bmatrix}. \end {equation} See the notebook 3-Channels DCT to see how to compute the filter’s coefficients.

When applied to the $\text {RGB}$ color domain, we will move from this domain to the 1D-DCT domain that have also 3 components, that we will denote by $\text {DCT}^0$, $\text {DCT}^1$ and $\text {DCT}^2$. Notice that if the decorrelation is effective, most of the energy will be concentrated in $\text {DCT}^0$, which represents the average energy of the image (luminance) (see the notebook Removing RGB redundancy with the DCT).

Notice also that the DCT is orthonormal, and therefore, the matrix of the forward transform is the transpose of the matrix of the backward transform [3]. This also means that the contribution of the synthesis filters (which define the inverse transform) to the reconstructed signal are independent and have exactly the unity gain.¹

3 2D-DCT

The 2D-DCT is separable, which means that it can be computed by appliying the 1D-DCT to the two dimensions of the signal (a digital image, for example). For the inverse 2D-DCT, the procedure is similar, but appliying the inverse 1D-DWT in reverse order. The Fig. 1 shows the first 64 2D-DCT basis.

4 Scalar quantization in the DCT domain

Since the DCT is orthonormal (orthogonal² + unitary³), the gain of each subband is equal to 1, and therefore, the quantization error is amplified with the same gain in all the coefficients.

For example, the notebook Removing RGB redundancy with the DCT explores the performence of the quantization steps pattern \begin {equation} \Delta _{\text {DCT}^0} = \Delta _{\text {DCT}^1} = \Delta _{\text {DCT}^2}. \end {equation}

However, this does not mean that all the subbands should be quantized using the same quantization step size $\Delta $, because some subbands can be compressed more efficienly than others and their distortion (after quantization) can be different. The notebook Image Compression with YCoCg + 2D-DCT shows how to use RD optimization to determine, given a target bit-rate, the best combination of quantization steps per subbands.

5 RDO (Rate/Distortion Optimization) using scalar quantization

If we consider that the RD curve can be affeced by the compresibility of the subbands, the use of scalar quantization open the posibility of using a different quantization step size for each subband, depending on the slopes of the corresponding RD points.

The DCT is orthogonal, and this means that we can optimize directly in the transform domain, using the RD curves of each subband, independently. Therefore, a better quantization steps pattern can be found if we select those steps that produce the same slope in each subband, for a given bit-rate.

6 References

[1] W. Burger and M.J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java. Springer, 2016.

[2] V. González-Ruiz. Transform Coding.

[3] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

¹To find the gains of any 1D transform we can compute the energy of the signal generated by the inverse transform of the impulse discrete 1D signal \begin {equation} \delta _{i}(x) = \left \{ \begin {array}{ll} 1 & \text {if $i=x$}\\ 0 & \text {otherwise}, \end {array} \right . \end {equation} where the energy of a discrete signal $\mathbf s$ is defined as \begin {equation} \langle {\mathbf s}, {\mathbf s} \rangle = \sum _{i}{{\mathbf s}_i^2}. \end {equation}

²The contribution of the synthesis filters to the reconstructed signal are independent.

³Unitary transforms are energy preserving; that is, the sum of the squares of the transformed sequence is the same as the sum of the squares of the original sequence.