Scalar (Digital) Quantization [6, 7] (see Fig. 1) is a technique in which each source sample is quantized independently from the other samples and therefore, a quantization index \({\mathbf k}_i\) is produced for each input sample \({\mathbf s}_i\) [2].
A \(K\)-levels Scalar Quantizer (SQ) \(Q\) performs a partition of the domain of \(\mathbf s\) into \(K\) cells \({\mathbf C}_k, k = 1, \cdots , K\) and associates to any \({\mathbf s}_i\) the quantization index \({\mathbf k}_i\) if \({\mathbf s}_i\in {\mathbf C}_k\). In other words, \begin {equation} Q({\mathbf s}_i) = {\mathbf k}_i \Leftrightarrow {\mathbf C}_{k-1} < {\mathbf s}_i \le {\mathbf C}_k. \end {equation}
The inverse quantizer \(Q^{-1}\) estimates \({\mathbf s}_i\) knowing \({\mathbf k}_i\) and possibly the PDF (Probability Density Function) \(p_{\mathbf S}({\mathbf s})\), using a reconstruction level \({\mathbf r}_k\in ]{\mathbf C}_{k-1}, {\mathbf C}_k]\), generating the output \begin {equation} \tilde {\mathbf s}_i = {\mathbf r}_k. \end {equation}
The smallest and the highest value of all \({\mathbf C}_k\) are called the decision boundaries of \(Q\). Therefore,
In an USQ, all decision levels are equally spaced by a distance known as the Quantization Step Size (QSS) \(\Delta \), satisfiying that the domain of the input signal is divided into intervals of constant size \begin {equation} \Delta ={\mathbf d}_{i+1}-{\mathbf d}_i={\mathbf r}_{i+1}-{\mathbf r}_i, \end {equation} where \({\mathbf d}_i\) is the \(i\)-th decision level and \({\mathbf r}_i\) is the \(i\)-th representation level.
In USQs, the quantization error \(\mathbf e\) depends on \(\Delta \) and can be modeled as a noise signal that: (1) is uncorrelated to the input \(\mathbf s\), (2) is white and therefore, (3) it follows a uniform distribution.
In mid-rise quantizers the reconstructed signal \(\tilde {\mathbf s}\) never is 0, even if \({\mathbf s}_i=0\) for any \(i\). The mapping process in a mid-rise quantizer can be described as \begin {equation} {\mathbf k}_i = \Big \lfloor \frac {{\mathbf s}_i}{\Delta } \Big \rfloor , \label {eq:mid-rise} \end {equation} and the inverse mapping by \begin {equation} \tilde {\mathbf s}_i = \Delta \Big ({\mathbf k}_i + \frac {1}{2}\Big ). \label {eq:inverse_mid-rise} \end {equation}
In mid-tread quantizers the reconstructed signal is \(0\) when \({\mathbf s}_i=0\). The mapping process in a mid-tread quantizer can be described as \begin {equation} {\mathbf k}_i = \mathrm {round}\Big ( \frac {{\mathbf s}_i}{\Delta } \Big ), \label {eq:midrise} \end {equation} and the inverse mapping by \begin {equation} \tilde {\mathbf s}_i = \Delta {\mathbf k}_i. \label {eq:inverse_midrise} \end {equation}
In a USQwD (USQ with Deadzone), the quantization step is \(2\Delta \) for \({\mathbf s}_i=0\). Deadzone quantizers tends to remove the electronic noise (that usually has a small amplitude compared to the input signal \(\mathbf s\)), precisely where the SNR (Signal-to-Noise Ratio) is the lowest.1
Uniform quantizers are efficient (minimize the distortion) only if the input samples (gray-scale values of the pixels) are uniformly distributed among the quantization bins [1]. However, when the probability of the samples is not “flat” (or the number of bins is small compared to the number of possible input values), we can use better quantizers. For example, if we know that most of the samples are small integer values (positive and negative2), a quantizer such as gray_SQ_companded.ipynb can be more suitable than an uniform one3.
If we know that the input signal \(\mathbf s\) does not follow an uniform distribution, it is possible to use a variable \(\Delta \) to minimize the quantization error \(\mathbf e\) in those samples that are more probable.
Companding (COMpressing + exPANDING) quantization is used when most of the samples are concentrated arround the \(0\) value. For example, most of the (interesting) audio for humans has a low volume (for this reason, companded quantizers are used in telephony).
In a companded codec, the original signal is mapped through a compressor, quantized using an uniform quantized, and re-mapped using the corresponding expander, resulting in a logarithmic quantization, centered at \(0\). A good example is the \(\mu \)-law codec.
This non-uniform quantizer is a generalization of the companded quantizer, where the input samples can follow any distribution. Now, with the idea of minimizing the distortion (in general, in terms of the MSE), we chose \({\mathbf \Delta }_i\) smaller where signal samples appear most offen.
The most used PDF quantizer is the Max-Lloyd quantizer [5], whose authors developed an iterative algorithm for determining the decision and representation levels.
Uniform quantizers are efficient (minimize the distortion) only if the input samples (gray-scale values of the pixels) are uniformly distributed among the quantization bins [1]. However, when the probability of the samples is not “flat” (or the number of bins is small compared to the number of possible input values), we can use better quantizers. For example, if we know that most of the samples are small integer values (positive and negative4), a quantizer such as gray_SQ_companded.ipynb can be more suitable than an uniform one5.
Notice that the Max-Lloyd quantizer is equivalent to use K-means [3] if we known the \(K\) parameter (the number of representation levels). In this case, the centroids computed by K-means are in the middle of each region. If we don’t know \(K\), we must use the Lloyd Algorithm [3], which also estimates \(K\).
Adaptive quantizers modify \(\Delta \) dynamically, depending on the local characteristics of \(\mathbf s\).
While samples in \(s\):
While data in input:
While \(s\) is not exhausted:
While \(\hat {s}\) is not exhausted:
A Jayant quantider defines the Step 2.B. as: Define a multiplier \(M_l\) for each quantization level \(l\), where for the inner levels \(M_l<1\) and for the outer levels \(M_l>1\), and compute:
\[ \Delta ^{[n]} = \Delta ^{[n-1]}{M_l}^{[n-1]}, \]
where \(\Delta ^{[n-1]}\) was the previous quantization step and \({M_l}^{[n-1]}\) the level multiplier for the \(n-1\)-th (previous) sample. Thus, if the previous (\(n-1\)) quantization used a \(\Delta ^{[n-1]}\) too small (using outer quantization levels) then \(\Delta ^{[n]}\) will be larger and viceversa.
Most Jayant quantizers clip the computation of \(\Delta \) to avoid generating a zero output quantizer in those contexts where \(s\) is zero or very close to zero, and to improve the adaptation to smaller samples after a sequence of bigger ones (avoiding to grow without limit):
\[ \begin {array}{ll} \text {if}~\Delta ^{[n]}<\Delta _{\text {min}}~\text {then}~\Delta ^{[n]} = \Delta _{\text {min}},\\ \text {if}~\Delta ^{[n]}>\Delta _{\text {max}}~\text {then}~\Delta ^{[n]} = \Delta _{\text {max}}. \end {array} \]
A Jayant quantized adapts the quantization step to the dynamic range of the signa using a set of multipiers. A similar effect can be provided by dividing the input signal by a scale factor defined iteratively as:
\begin {equation} \alpha ^{[n]} = \alpha ^{[n-1]}M_l^{[n-1]}. \end {equation}
Normaly, RD curves are convex [?] (this can be seen in the notebook A Comparison of Scalar Quantizers). This means that:
As it can be also seen in the previous notebook that the performance of the quantizers is not the same: usually midrise and midtread, performs better than deadzone at intermediate bit-rates, but deadzone is the best a low bit-rates (excluding Lloyd-Max). Deadzone has also another advantage over midread and midtread: when \(\Delta \) is a power of 2 (which corresponds to a bit-plane encoding), the obtained RD point is near optimal in the RD space. Finally, the Lloyd-Max Quantizer reaches the highest performance because it is adaptive.
An important consideration is the relative perfectual importance of the input samples. This leads to a weighting of the MSE at the output. The weighting function can be derived through experiments to determine the “level of just noticeable noise”. For example, in subband coding, as expected, high frequecy subbands tolerate more noise because the HAS (Human Auditory System) becomes less sensitive at them.
Normally, humans hardly distinguish more than 64 different colors.
The quantization error can be modeled as a random signal \(\mathbf {e}\) that is added to the original one \(\mathbf {s}\), and the energy of this error signal \(\langle \mathbf {e},\mathbf {e}\rangle \) depends on the QSS and the values of \(\mathbf {s}\).
So far, we have designed the image compressor in two steps: (1) quantize the pixels ... trying to minimize the distortion, and (2) entropy encode (compress) the quantization indexes. However, if we assume that the entropy encoding performance is proportional to some rate-prediction metric, such as the entropy, it is possible to design the quantizer minizing the two RD variables (rate and distortion) at the same time [6].
However, there is not (yet) any known reason to think that this approach will be generate better RD curves that the one in which only the distortion is both parameters (R and D) are minimized independently. Besides, the joint optimization of R and D makes the quantizer dependent of the entropy codec, that reduces the flexility of the encoding system.
[1] V. González-Ruiz. Scalar Quantization.
[2] V. González-Ruiz. Signal Quantization.
[3] John A Hartigan and Manchek A Wong. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 28(1):100–108, 1979.
[4] Nuggehally S. Jayant. Digital coding of speech waveforms: PCM, DPCM, and DM quantizers. Proceedings of the IEEE, 62(5):611–632, 1974.
[5] Stuart Lloyd. Least squares quantization in PCM. IEEE transactions on information theory, 28(2):129–137, 1982.
[6] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.
[7] M. Vetterli, J. Kovačević, and V.K. Goyal. Foundations of Signal Processing. Cambridge University Press, 2014.
1Notice that, by definition, dead-zone quantizers should not be considered uniform, and that all dead-zone quantizers, by definition, are mid-tread.
2Such as it happens with audio CD data.
3At least, from a pure-distortion point of view.
4Such as it happens with audio CD data.
5At least, from a pure-distortion point of view.