Video compression fundamentals

Vicente González Ruiz

March 6, 2020

Contents

1 Sources of redundancy
2 Hybrid video coding
3 Block-based MC (Motion Compensation) [1]
4 Sub-pixel accuracy
5 Matching criteria (similitude between macroblocks)
6 Searching strategies
7 The GOP (Group Of Pictures) concept
8 MCTF (Motion Compensated Temporal Filtering)
9 ± 1-spiral-search ME (Motion Estimation)
10 Linear frame interpolation using block-based motion compensation
11 MC/DWT hybrid coding alternatives
12 Deblocking filtering
13 Bit-rate allocation
14 Video scalability
 14.1 Quality scalability
 14.2 Temporal scalability
 14.3 Spatial scalability
References

1 Sources of redundancy

2 Hybrid video coding



Figure 1: Hybrid video coding.

See the Fig. 1.

3 Block-based MC (Motion Compensation) [1]

See the Fig. 2.



Figure 2: Types of macroblocks.

4 Sub-pixel accuracy



Figure 3: Pixel interpolation.

5 Matching criteria (similitude between macroblocks)

6 Searching strategies

7 The GOP (Group Of Pictures) concept



Figure 5: A GOP.

8 MCTF (Motion Compensated Temporal Filtering)

9 ± 1-spiral-search ME (Motion Estimation)



Figure 6: Spiral search.

10 Linear frame interpolation using block-based motion compensation



Figure 7: Frame interpolation.

Input

Output

Algorithm

  1. Compute the DWTl, where l = log2(R) 1 levels, of the predicted frame sj and the two reference frames si and sk. Example.
  2. LLl(m) 0, or any other precomputed values (for example, from a previous ME in neighbor frames). Example.
  3. Divide the subband LLl(sj) into blocks of size B × B pixels, and ± 1-spiral-search them in the subbands LLl(si) and LLl(sk), calculating a low-resolution LLl(m) = {LLl(m),LLl(m)} bi-directional motion vector field. Example. Example.
  4. While l > 0:

    1. Synthesize LLl1(m), LLl1(sj), LLl1(si) and LLl1(sk), by computing the 1-level DWT1. Example. Example
    2. LLl1(M) LLl1(M) × 2. Example.
    3. Refine LLl1(m) using ± 1-spiral-search. Example.
    4. l l 1. (When l = 0, the motion vectors field m has the structure:)

    Example.

  5. While l < A (in the first iteration, l = 0, and LL0(M) := M):

    1. l l + 1.
    2. Synthesize LLl(sj), LLl(si) and LLl(sk), computing the 1-level DWT1 (high-frequency subbands are 0). This performs a zoom-in in these frames using 12-subpixel accuracy.

      Example.

    3. m m × 2.

      Example.

    4. B B × 2.
    5. Divide the subband LLl(sj) into blocks of B × B pixels and ± 1-spiral-search them into the subbands LLl(si) and LLl(sk), calculating a 12l sub-pixel accuracy m bi-directional motion vector field. Example.
    6. Frame prediction. For each block b:
    7. Compute
      b̂ biemax e(b) + bkemax e(b) emax e(b) + emax e(b) , (4)

      where e(b) is the (minimum) distortion of the best backward matching for block b, e(b) the (minimum) distortion of the best forward matching for block b, emax = emax are the backward and forward maximum matching distortions, bi is the (backward) block found (as the most similar to b) in frame si and bk is the (forward) block found in frame sk. Notice that, if e(b) = e(b), then the prediction is

      b̂ = bi + bk 2 , (5)

      and if e(b) = 0,

      b̂ = bk, (6)

      and viceversa.

Lab

Implement the Section 10 (work on https://github.com/Sistemas-Multimedia/MCDWT/blob/master/transform/mc/block/interpolate.py). Use https://github.com/Sistemas-Multimedia/MCDWT/blob/master/mcdwt/mc/block/interpolate.py and https://github.com/vicente-gonzalez-ruiz/MCTF-video-coding/blob/master/src/motion\_estimate.cpp as reference.

Lab

Compare the performance of the proposed matching strategies (MSE, MAE and EE) in the Section 10, by computing the variance of the prediction error between the original frame (sj) and the prediction frame (ŝj).

Lab

Test different DWT filters in the Section 10 and compare their performance. Compute the prediction error between the original frame (sj) and the prediction frame (ŝj). Measure the dependency between this performance and the distance between frames (i, j, and k indexes).

Lab

Test the use of both the luma and the chroma in Section 10, and measure the performance of each option (only luma vs. all components), by computing the prediction error between the original frame (sj) and the prediction frame (ŝj). Measure the dependency of the results with the distance between frames (i, j, and k indexes).

Lab

Analyze the impact of the R (search range) parameter in the Section 10. Compute the prediction error between the original frame (sj) and the prediction frame (ŝj). Study the impact of initializing the motion vectors (Section ??). Measure the dependency with the distance between frames (i, j, and k indexes).

IPython notebook

Lab

Analyze the impact of the O (overlaping) parameter in the Section ??, by means of computing the prediction error between the original frame (sj) and the prediction frame (ŝj). Measure the dependency with the distance between frames (i, j, and k indexes).

Lab

Analyze the impact of the B (block size) parameter in the Section ??, by computing the prediction error between the original frame (sj) and the prediction frame (ŝj). Compute the expected size of the motion fields using their 0-order entropy. Measure the dependency with the distance between frames (i, j, and k indexes).

Lab

Analyze the impact of the A (subpixel accuracy) parameter in the Section ??, by computing the prediction error between the original frame (sj) and the prediction frame (ŝj). Compute the expected size of the motion fields using their entropy. Measure the dependency with the distance between frames (i, j, and k indexes).

IPython notebook

Lab

Compare the performance of the Section ?? when it holds that

b̂ = bi + bk 2 , (7)

for all blocks.

11 MC/DWT hybrid coding alternatives



Figure 8: SVC scheme in H.264.

12 Deblocking filtering


800

Figure 9: Deblocking filetering effect.

13 Bit-rate allocation

14 Video scalability

14.1 Quality scalability



Figure 12: Quality scalability.

Véase la Fig. 12.

14.2 Temporal scalability



Figure 13: Temporal scalability.

Véase la Fig. 13.

14.3 Spatial scalability



Figure 14: Spatial scalability.

Véase la Fig. 14.

References

[1]   Kamisetty Ramamohan Rao and Jae Jeong Hwang. Techniques and standards for image, video, and audio coding, volume 70. Prentice Hall New Jersey, 1996.