MPEG Audio [1]

MPEG Audio [1]

Vicente González Ruiz

January 1, 2020

Contents

1 Intro
2 Layer I
2.1 Encoder
2.2 Decoder
3 Loss information analysis
4 Layer II
5 Layer III
5.1 Encoder
5.2
References

1 Intro

Input: a sequence of 16-bit PCM samples.
Output: a sequence of MPEG Audio frames (frame = header + code-stream) which can be streamed.
audio   +-----------+       +--------------+         +----------+
channel | Time      |       | Quantization |         |          | MPEG audio
---+--->| frequency |---+-->| and          |-------> | Framming |------->
   |    | mapping   |   |   | coding       |         |          |
   |    +-----------+   |   +--------------+         +----------+
   |                    |           ^
   |                    |           |
   |                    |   +--------------+
   |                    +-->| Phycho-      |
   |                        | acustic      |
   +----------------------->| model        |
                            +--------------+
The MPEG audio bitstream deﬁnition is normative. Most guidance about encoding is informative. Thus, two MPEG-compliant bitstreams that encode the same audio material at the same rate but on diﬀerent encoders may sound very diﬀerent. On the other hand, a given MPEG bitstream decoded on diﬀerent decoders will result in essentially the same output.

2 Layer I

4:1 compression (384 kbps).
CBR (Constant Bit-Rate).

2.1 Encoder

Split s[n] into blocks of 12 × 32 = 384 samples. For each block:
1. Analyze the block using a 32-band equally-spaced (analysis) ﬁlter bank, producing $12$ coeﬀs/subband (the coeﬀs are downsampled (subsampled, decimated) by factor of $32$ ). $342$ time-domain samples are transformed into $32$ subbands with $12$ coeﬀs. Notice that in a subband, each coeﬀ can be considered as a sample of such subband.
2. Scale each block of $12$ coeﬀs to ensure that the entire range of the selected quantizer will be used. Output the *scalefactor*.
3. Using the FFT, compute the ATH for the block (considering the masking eﬀects).
4. Let R∗ the bit-rate selected by the user. While the generated bit-rate R ≤ R∗:
  1. Decrement the quantization step $Δ_{b}$ for each subband $b$ , proportionally to the ATH in $b$ . Compute $R$ . The bit-rate is controlled be switching between quantizers with diﬀerent number of bits.
5. Output ${Δ_{b}}_{b = 1}^{32}$ and the quantization indexes.

2.2 Decoder

For each input frame:
1. "Dequantize" the coeﬀs of each subband.
2. Descale the coeﬀs to their original dynamic range.
3. Apply the 32-band synthesis ﬁlters bank.

3 Loss information analysis

Aliasing in the 32-band analysis ﬁlter bank.
   ^ Amplitude
   |     _______     _______     ____....__     _______
   |    /       \   /       \   /          \   /       \
   |   /         \ /         \ /            \ /         \
   |  / subband 0 X subband 1 X              X  sub. 31  \
   +-/-----------/-\---------/-\-----....---/-\-----------\-> frequency
Quantization.

4 Layer II

Backward compatibile with MP1.
8:1 compression (174 kbps).
CBR (Constant Bit-Rate).
Increases block-size to $3 \times 12 \times 32 = 1152$ samples.

5 Layer III

“Rescued” by Napster.
Backward compatibile with MP1 and MP2.
CBR and VBR (Variable Bit-Rate). In this last case, users usually select the average bit-rate.
Typically, virtually lossless at 128 kbps for most human beings.
Improved subband analysis by means of the MDCT (using 32 subbands, the low-frequency ones contains more than un bark, which generates a poor frequency resolution in the ATH computation).

5.1 Encoder

Split s[n] into blocks of 36 × 32 = 1152 samples. For each block:
1. Performs FFT of the block to compute the ATH and windows sequence.
2. Analyze the block using a 32-band equally-spaced (analysis) ﬁlter bank, producing $36$ coeﬀs/subband.
3. For each subband:
  1. Analyze transients. If detected, use a sequece of start/short*3/stop windows. Otherwise, use a long window.
  2. Compute MDCT. This produces $36$ (long), $30$ (start/stop) or $12$ coeﬀs/subband (short). This step produces $18$ coeﬀs/subband (long), $15$ coeﬀs/subband (start/stop) and $6$ coeﬀs/subband (short).
  3. Apply scalefactors to optimize quantization.
4. Distortion control loop: keep (as much as possible) the quantization error below the ATH.
  1. Rate control loop: Let R∗ the bit-rate selected by the user. While the generated bit-rate R ≤ R∗:
    1. Decrement the quantization step $Δ_{b}$ for each subband $b$ , proportionally to the ATH in $b$ . Compute $R$ after encoding the quantizer indexes with (static) Huﬀman coding. As in previous layers, a quantizer is selected from a list of predeﬁned logaritmic quantizers.

5.2

For each input frame:
1. Decode the Huﬀman codes.
2. “Dequantize” the coeﬀs of each subband.
3. Descale the coeﬀs to their original dynamic range.
4. Apply inverse MDCT.
5. Apply the 32-band synthesis ﬁlters bank.

References

[1] Khalid Sayood. Introduction to data compression. Morgan Kaufmann, 2017.