Probabilistic models

Vicente González Ruiz

January 7, 2020

Contents

1 Funtionality
2 Static models
3 Adaptive models
4 Encoding
5 Decoding
6 Initially empty models
7 Encoder
8 Decoder
9 Models with memory
10 Encoder
11 Decoder
References

1 Funtionality

2 Static models

Let’s go the the lab!

  1. Clone https://github.com/vicente-gonzalez-ruiz/basic-compression-tools and run make huff_s0.
  2. Encode the images of the table below that can be found at The Test Image Corpus using the codec.
        Codec | lena boats pepers zelda  
    ----------+------------------------  
          RLE | ....  ....   ....  ....  
      BWT+RLE | ....  ....   ....  ....  
         LZSS | ....  ....   ....  ....  
          LZW | ....  ....   ....  ....  
      huff_s0 | ....  ....   ....  ....

    Continue filling the table ...

  3. Check that the huff_s0 codec is lossless.

3 Adaptive models

4 Encoding

  1. Asign the same probability to all the symbols of the source alphabet.
  2. While the input if not exhausted:
    1. Encode the next symbol.
    2. Update (increase) its probability.

5 Decoding

  1. Identical to the step 1 of the encoder.
  2. While the input is not exhausted:
    1. Decode the next symbol.
    2. Identical to the step 2.b of the encoder.

Let’s go the the lab!

  1. Continue filling the table:
        Codec | lena boats pepers zelda Average  
    ----------+--------------------------------  
            : |    :     :      :     :       :  
      huff_s0 | ....  ....   ....  ....    ....  
      huff_a0 | ....  ....   ....  ....    ....  
     arith_a0 | ....  ....   ....  ....    ....

  2. Remember to check that both codecs are lossless.

6 Initially empty models

7 Encoder

  1. Set the probability of the ESC symbol to 1.0 (and the probability of the rest of the symbols to 0.0).
  2. While the input is not exhausted:
    1. s next symbol.
    2. If s has been found before, then:
      1. Output c(s) (encode).
    3. Else:
      1. Output c(ESC).
      2. Output a raw symbol s.
    4. Update p(s).

8 Decoder

  1. Identical to the step 1 of the encoder.
  2. While the input is not exhausted:
    1. c(s) next code-word.
    2. Decode s.
    3. If s = ESC, then:
      1. Input a raw symbol s.
    4. Update p(s).
    5. Output s.

Let’s go the the lab!

  1. make arith-n-c and make arith-n-d.
  2.          Codec | lena boats pepers zelda Average  
    ---------------+--------------------------------  
                 : |    :     :      :     :       :  
    arith-n-c -o 0 | ....  ....   ....  ....    .... #1  
     BWT | ahuff_0 | ....  ....   ....  ....    ....  
          bzip2 -9 | ....  ....   ....  ....    .... #2  
           gzip -9 | ....  ....   ....  ....    .... #3  
     
    #1 -> Similar to arith_a0 but using an initially  
          empty model.  
     
    #2 -> Similar to BWT | ahuff_0  
     
    #3 -> Similar to LZ77 + ahuff_0

  3. Check the reversibility.

9 Models with memory

10 Encoder

  1. Create an empty model for every possible context of length 0 i k.
  2. Create an non-empty model for k = 1.
  3. While the input is not exhausted:
    1. s Input log2(r) bits. # Input a symbol.
    2. i k (except for the first symbol, where i 0).
    3. i k (except for the first symbol, where i 0):
      1. Output c(ESC|𝒞[i]).
      2. Update p(ESC|𝒞[i]).
      3. Update p(s|𝒞[i]) (insert s into the 𝒞[i] context).
      4. i i 1.
    4. Output c(s|𝒞[i]). The symbols that were found in the models for contexts with order > i must be excluded of the actual (𝒞[i]) context-model because, for sure, s is not any of them.
    5. If i 0, update p(s|𝒞[i]).

Example



Figure 1: An example of context-based statistical encoding.

11 Decoder

  1. Equal to the step 1 of the encoder.
  2. While the input is not exhausted:
    1. i k (except for the first symbol, where i 0).
    2. s next decoded symbol.
    3. While s = ESC:
      1. Update p(ESC|𝒞[i]).
      2. i i 1.
      3. s next decoded symbol.
    4. Update p(s|𝒞[i]).
    5. While i < k:
      1. i i + 1.
      2. Update p(s|𝒞[i]) (insert s into the 𝒞[i] context).

Let’s go the the lab!

  1.          Codec | lena boats pepers zelda Average  
    ---------------+--------------------------------  
                 : |    :     :      :     :       :  
    arith-n-c -o 1 | ....  ....   ....  ....    ....  
    arith-n-c -o 2 | ....  ....   ....  ....    ....  
    arith-n-c -o 3 | ....  ....   ....  ....    ....

  2. Check the reversibility.

References

[1]   John Cleary and Ian Witten. Data compression using adaptive coding and partial string matching. IEEE transactions on Communications, 32(4):396–402, 1984.