Information Theory

Vicente González Ruiz - Depto Informática - UAL

January 10, 2023

Contents

 1 Energy of a digital signal
 2 Information measurement
 3 Statistical redundancy
 4 Measurement of the redundancy
 5 Rate
 6 Distortion
 7 The RD (Rate/Distortion) curve
 8 Slope computation
 9 RDO (Rate/Distortion) Optimization
 10 Comparing RD curves
 11 Resources

1 Energy of a digital signal

The energy of a signal \(\mathbf x\) is found by \begin {equation} \langle {\mathbf x}, {\mathbf x}\rangle = \sum _{i}{{\mathbf x}_i^2}. \end {equation}

2 Information measurement

Information is usually estimated through the measurement of the variance or the entropy. The higher the variance (or the entropy), the higher the amount of information represented by the data.

3 Statistical redundancy

Statistical redundancy is present in all those sequences of symbols where we can infeer a probability of ocurrence of a symbol taking into consideration the rest of symbols of a sequence. Statistical redudancy can be found in text, audio, image and video, among other sources of information.

Statistical redundancy can be removed by text compressors. Statistical redundancy is also called source-coding redundancy [1].

4 Measurement of the redundancy

To measure the redundancy we have basically two options:

  1. Compute the 0-order (memoryless source) entropy of the signal: the higher the entropy, the lower the redudancy. In fact, if we suppose that the samples of the signal are uncorrelated, the 0-order entropy is an exact measure of the expected bit-rate achieved by an arithmetic encoder (the most efficient entropy compressor). Unfortunately, the 0-order entropy is usually only a estimation of the redundancy, i.e., lower bit-rates can be achieved in practice after using a high-order decorrelation.
  2. A better way is to use an lossless compressor: the higher the length of the compressed file compared to the length of the original file, the lower the redundancy.1 Notice, however, that although this estimation is more accurate than the 0-order entropy, in general, it depends on the compressor (different algoritms can provide different compression ratios).

5 Rate

In a communication system, the rate describes the amount of data (for example, bits) that is transmited by time unit (for example, a second).

6 Distortion

In an lossy encoding system, the distortion (expressed for example by the MSE) measures the amount of error between two signals: the original signal and the distorted one. The origin of this error can be quite varied and ranges from transmission errors to quantization processes.

7 The RD (Rate/Distortion) curve

PIC

Figure 1: Two RD (Rate/Distortion) curves.

A RD curve (see Fig. 1) represents the distortion versus the rate (i.e, the RD performance) of any lossy encoding system. Basically, a RD curve express the efficiency of such encoding system for the source of information that is compressed. For example, when the signal is quantized, bit-rate and MSE are “opposed” features of the encoding system in the sense that, for example, if the bit-rate is descreased, the MSE is increased, and viceversa. Such variables (Rate (R) and Distortion (D)) can be represented as a curve.

Usually, RD curves are convex, which means that if \(\lambda _i\) is the slope of the curve measured at the \(i\)-th point of the curve (starting at the lowest bit-rate), it hold that \begin {equation} \lambda _i > \lambda _{i+1}. \label {eq:convexity} \end {equation} where \(\lambda \) quantifies the trade-off between decreasing the distortion2 while the bit-rate increases [32]. Eq. \eqref{eq:convexity} is not always true in the real world.

Notice that, the higher the slope, the higher the benefit in terms of RD.

8 Slope computation

PIC

Figure 2: Computation of a RD slope.

We define the slope of the \(n\)-th point in a RD curve as the slope of the straight line passing through the points \(n\) and \(n+1\) (see the Fig. 2). The slope of the point for the highest rate is defined to \(0\).

9 RDO (Rate/Distortion) Optimization

RDO is a technique that can be used when we need to encode at the same time two or more different sources of information, \(A\) and \(B\). When this happens, each source has its own charateristic RD curve (see the Fig. [?]) (without considering the rest of sources).

If we suppose now that the contribution to the distortion of the quantization of each source is independent3 and additive4, that is \begin {equation} D = D^A + D^B, \label {eq:additive} \end {equation} where \(D\) denotes distortion, then the optimal quantization steps must satisfy that [32] \begin {equation} \lambda ^A_i = \lambda ^B_i. \label {eq:optimal_quantization} \end {equation}

To see this, lets suppose that we have used, for example, a set of quantization step sizes so that \(\lambda ^A_i/2 = \lambda ^B_i,\) and that we still have room for more bits to encode the frame. In this situation, the maximum benefit would be obtained if and only if we decrease \(\Delta ^A_i\) (modifiying the corresponding \(\mathbf {\Delta }_i\)), because the slope for the source \(A\) doubles the slope of the source \(B\). Therefore, the optimal quantization steps are obtained when Eq. ?? is true.

10 Comparing RD curves

If we have two (convex) RD curves \(A\) and \(B\), the best one from a RD perspective is those that minimizes the sum of the distances from its RD points to the origin of coordinates \((0,0)\).

One of the most common techniques to compare RD curves is based on the Lagrangian [32] \begin {equation} J^{\{A,B\}}=R^{\{A,B\}}+\lambda D^{\{A,B\}}, \label {eq:lagrangian} \end {equation} that, for a given \(\lambda \) (considering the same slope in both curves) indicates which curve is closer to the origin.

11 Resources

[1]   Ahmet Kondoz. Visual Media Coding and Transmission. John Wiley & Sons, 2009.

[2]   K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.

[3]   M. Vetterli and J. Kovačević. Wavelets and Subband Coding. Prentice-hall, 1995.

1If the length of the compressed file is equal or larger than the length of the original file, then, for the compressor that we are using, there is not redundancy in the original representation.

2For this reason, the slopes are negative.

3The distortion generated in one of the sources does not influence the distortion generated in the other source.

4Or in general, a linear combination of the distortions, where each source contributes more than the other(s) to the total distortion \(D\) (this occurs, for example, when the inverse filters of a linear (orthogonal, backward) transform has different gains, in which case the quantization step sizes for a subband should be inversely proportional to the subband gain).