Information Theory

The energy of a signal \(\mathbf x\) is found by \begin {equation} \langle {\mathbf x}, {\mathbf x}\rangle = \sum _{i}{{\mathbf x}_i^2}. \end {equation}

2 Information measurement

Information is usually estimated through the measurement of the variance or the entropy. The higher the variance (or the entropy), the higher the amount of information represented by the data.

3 Statistical redundancy

Statistical redundancy is present in all those sequences of symbols where we can infeer a probability of ocurrence of a symbol taking into consideration the rest of symbols of a sequence. Statistical redudancy can be found in text, audio, image and video, among other sources of information.

Statistical redundancy can be removed by text compressors. Statistical redundancy is also called source-coding redundancy [1].

4 Measurement of the redundancy

5 Rate

In a communication system, the rate describes the amount of data (for example, bits) that is transmited by time unit (for example, a second).

6 Distortion

In an lossy encoding system, the distortion (expressed for example by the MSE) measures the amount of error between two signals: the original signal and the distorted one. The origin of this error can be quite varied and ranges from transmission errors to quantization processes.

7 The RD (Rate/Distortion) curve

A RD curve (see Fig. 1) represents the distortion versus the rate (i.e, the RD performance) of any lossy encoding system. Basically, a RD curve express the efficiency of such encoding system for the source of information that is compressed. For example, when the signal is quantized, bit-rate and MSE are “opposed” features of the encoding system in the sense that, for example, if the bit-rate is descreased, the MSE is increased, and viceversa. Such variables (Rate (R) and Distortion (D)) can be represented as a curve.

Usually, RD curves are convex, which means that if \(\lambda _i\) is the slope of the curve measured at the \(i\)-th point of the curve (starting at the lowest bit-rate), it hold that \begin {equation} \lambda _i > \lambda _{i+1}. \label {eq:convexity} \end {equation} where \(\lambda \) quantifies the trade-off between decreasing the distortion² while the bit-rate increases [3, 2]. Eq. \eqref{eq:convexity} is not always true in the real world.

8 Slope computation

We define the slope of the \(n\)-th point in a RD curve as the slope of the straight line passing through the points \(n\) and \(n+1\) (see the Fig. 2). The slope of the point for the highest rate is defined to \(0\).

9 RDO (Rate/Distortion) Optimization

RDO is a technique that can be used when we need to encode at the same time two or more different sources of information, \(A\) and \(B\). When this happens, each source has its own charateristic RD curve (see the Fig. [?]) (without considering the rest of sources).

If we suppose now that the contribution to the distortion of the quantization of each source is independent³ and additive⁴, that is \begin {equation} D = D^A + D^B, \label {eq:additive} \end {equation} where \(D\) denotes distortion, then the optimal quantization steps must satisfy that [3, 2] \begin {equation} \lambda ^A_i = \lambda ^B_i. \label {eq:optimal_quantization} \end {equation}

To see this, lets suppose that we have used, for example, a set of quantization step sizes so that \(\lambda ^A_i/2 = \lambda ^B_i,\) and that we still have room for more bits to encode the frame. In this situation, the maximum benefit would be obtained if and only if we decrease \(\Delta ^A_i\) (modifiying the corresponding \(\mathbf {\Delta }_i\)), because the slope for the source \(A\) doubles the slope of the source \(B\). Therefore, the optimal quantization steps are obtained when Eq. ?? is true.

10 Comparing RD curves

If we have two (convex) RD curves \(A\) and \(B\), the best one from a RD perspective is those that minimizes the sum of the distances from its RD points to the origin of coordinates \((0,0)\).

One of the most common techniques to compare RD curves is based on the Lagrangian [3, 2] \begin {equation} J^{\{A,B\}}=R^{\{A,B\}}+\lambda D^{\{A,B\}}, \label {eq:lagrangian} \end {equation} that, for a given \(\lambda \) (considering the same slope in both curves) indicates which curve is closer to the origin.

11 Resources

[1] Ahmet Kondoz. Visual Media Coding and Transmission. John Wiley & Sons, 2009.

[2] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

[3] M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.

¹If the length of the compressed file is equal or larger than the length of the original file, then, for the compressor that we are using, there is not redundancy in the original representation.

²For this reason, the slopes are negative.

³The distortion generated in one of the sources does not influence the distortion generated in the other source.

⁴Or in general, a linear combination of the distortions, where each source contributes more than the other(s) to the total distortion \(D\) (this occurs, for example, when the inverse filters of a linear (orthogonal, backward) transform has different gains, in which case the quantization step sizes for a subband should be inversely proportional to the subband gain).