An introduction to sounddevice

Vicente González Ruiz - Depto Informática - UAL

October 12, 2023

sounddevice is a Python module that provides bindings for the PortAudio library [1].

Let’s see some examples of what sounddevice can do:

Wiring the ADC and the DAC using a loop

 
# The same as wire3.py, but using NumPy arrays. 
 
import sounddevice as sd 
import numpy as np 
 
stream = sd.Stream(samplerate=44100, channels=2, dtype='int16') 
stream.start() 
while True: 
    chunk, overflowed = stream.read(stream.read_available) 
    if overflowed: 
        print("Overflow") 
    stream.write(chunk)

This module implements the algorithm:

while True: 
  chunk = sound_card.read(chunk_size = 1024)  # (1) 
  sound_card.write(chunk)  # (2)

where (1) captures 1024 frames from the ADC, and (2) plays the chunk of frames. In sounddevice a frame is a collection of one or more samples (typically, a frame is a single sample if the number of channels is 1, or two samples if the number of channels is 2).

If you want to run this module right now and you are not using a headset, check first that the output volumen of your speakers is not too high, otherwise you could involuntary “couple” the speaker and the mic(rophone) of your computer, producing a loud and annoying tonal sound. In order to mitigate this effect, you can also control the gain of your mic (if the gain is 0, no feedback between the speaker and the mic will be possible). In Xubuntu, these controls are available through clicking in the speaker icon (situated in the top-right corner of your screen) of the Xfce window manager.

To run the module:

python wire5.py

Stop (killing) the module by clicking the CTRL- and c-keys (CTRL+c), simultaneously.

The chunk size introduces some latency. If we want to measure it:

  1. First, we need the tools: SoX, Audacity, and plot_input.py:

    sudo apt install sox 
    sudo apt install audacity 
    sudo apt install curl 
    curl https://raw.githubusercontent.com/Tecnologias-multimedia/InterCom/master/test/sounddevice/plot_input.py > plot_input.py
  2. Run:

    pip install matplotlib 
    python plot_input.py

    and check that the gain of the mic does not produce clipping during the sound recording.

  3. In a terminal, run:

    python wire3.py

    while you control the output volume of the speakers to produce a decaying coupling noisy effect between both devices (the speaker(s) and the mic). If your desktop has not these transducers, we can use a male-to-male jack audio cable and connect the line-output of your soundcard to the input of your sound card.

  4. In a different terminal (keep python wire3.py running), run:

    sox -t alsa default test.wav

    to save the ADC’s output to the file test.wav.

  5. While sox is recording, produce some short sound (for example, hit your laptop or your micro with one or your nails). Do this at least a couple of times more, to be sure that you record the sound and also the feedback of such sound. It’s important the sound to be short (a few miliseconds) in order to visually recognize it and it’s replicas.
  6. Stop sox by pressing the CTRL+c (at the same time). This kills sox.
  7. Kill python wire3.py with CTRL+c.
  8. Load the sound file into Audacity:

    audacity test.wav
  9. Localize the first one of your hitting-nail sounds in the audio track of Audacity.
  10. Select (using the mouse) the region that contains your sound and a replica.
  11. Use the zoom to selection buttom to zoom-in the selected area.
  12. Measure the time between the ocurrence of the hit (of the nail) and the recording of its first replica produced by the speaker-to-the-mic feedback. This time is the real latency of your computer runing wire3.py.
  13. Modify the constant \(\mathtt {CHUNK\_SIZE}\) in the module and repeat this process, starting at the Step 3. Create an ASCII file (named latency_vs_chunk_size.txt) with the content (use TAB-ulators to space the columns):

         # CHUNK_SIZE    real
         32              ...
         64              ...
         128             ...
         256             ...
         512             ...
         1024            ...
         2048            ...
         4096            ...
         8192            ...
    

    with the real (practical) latency.

At this point, we know the real latency of wire3.py as a function of \(\mathtt {CHUNK\_SIZE}\). Plot the file latency_vs_chunk_size.txt with:

 sudo apt install gnuplot 
 echo "set terminal pdf; set output 'latency_vs_chunk_size.pdf'; set xlabel 'CHUNK\_SIZE (frames)'; set ylabel 'Latency (seconds)'; plot 'latency_vs_chunk_size.txt' title '' with linespoints" | gnuplot

Let’s compute now the buffering latency of a chunk (the chunk-time). If \(\mathtt {sampling\_rate}\) is the number of frames per second during the recording process, it holds that:

\begin {equation} \mathtt {minimal\_buffering\_latency} = \mathtt {CHUNK\_SIZE} / \mathtt {sampling\_rate} \end {equation}

Add these calculations to latency_vs_chunk_size.txt using a third column (remember to use TABs).

# CHUNK_SIZE    real    minimal
:               :       :

Plot both latencies:

echo "set terminal pdf; set output 'latency_vs_chunk_size.pdf'; set xlabel 'CHUNK\_SIZE (frames)'; set ylabel 'Latency (seconds)'; set key left; plot 'latency_vs_chunk_size.txt' using 1:2 title 'Real' with linespoints, 'latency_vs_chunk_size.txt' using 1:3 title 'Minimal' with linespoints" | gnuplot

Which seems to be the minimal practical (real) latency (the latency obtained ideallly when \(\mathtt {CHUNK\_SIZE}=1\) ... however, notice that depending on your computer, this chunk size can be too small, overwhelming the CPU) in your computer? Justify your answers.

Wiring the ADC and the DAC using an interruption handler

 
#!/usr/bin/env python3 
 
# https://python-sounddevice.readthedocs.io/en/0.3.13/_downloads/wire.py 
 
"""Pass input directly to output. 
 
See https://www.assembla.com/spaces/portaudio/subversion/source/HEAD/portaudio/trunk/test/patest_wire.c 
 
""" 
import argparse 
import logging 
 
 
def int_or_str(text): 
    """Helper function for argument parsing.""" 
    try: 
        return int(text) 
    except ValueError: 
        return text 
 
 
parser = argparse.ArgumentParser(description=__doc__) 
parser.add_argument('-i', '--input-device', type=int_or_str, 
                    help='input device ID or substring') 
parser.add_argument('-o', '--output-device', type=int_or_str, 
                    help='output device ID or substring') 
parser.add_argument('-c', '--channels', type=int, default=2, 
                    help='number of channels') 
parser.add_argument('-t', '--dtype', help='audio data type') 
parser.add_argument('-s', '--samplerate', type=float, help='sampling rate') 
parser.add_argument('-b', '--blocksize', type=int, help='block size') 
parser.add_argument('-l', '--latency', type=float, help='latency in seconds') 
args = parser.parse_args() 
 
try: 
    import sounddevice as sd 
    import numpy  # Make sure NumPy is loaded before it is used in the callback 
    assert numpy  # avoid "imported but unused" message (W0611) 
 
    def callback(indata, outdata, frames, time, status): 
        if status: 
            print(status) 
        outdata[:] = indata 
 
    print("blocksize =", args.blocksize) 
 
    with sd.Stream(device=(args.input_device, args.output_device), 
                   samplerate=args.samplerate, blocksize=args.blocksize, 
                   dtype=args.dtype, latency=args.latency, 
                   channels=args.channels, callback=callback): 
        print('#' * 80) 
        print('press Return to quit') 
        print('#' * 80) 
        input() 
except KeyboardInterrupt: 
    parser.exit('\nInterrupted by user') 
except Exception as e: 
    parser.exit(type(e).__name__ + ': ' + str(e))

1 Resources

[1]   M. Walker. python-sounddevice.