Human Perception of the Sound (Psychoacoustics)

  • This chapter describes a psychoacoustic model for hearing.

Transduction

  • The human auditory system (HAS) is responsible for converting pressure variations caused by the sound waves (that reach the ear) in the synaptic signals that are interpreted by the brain.

Perception of sound intensity (loudness)

  • The relationship between the perceived volume of a sound and the actual volume of the sound is not linear, but logarithmic. For this reason, a decibel measurement of the sound intensity is the most appropriate to express this parameter.

Absolute Threshold of Hearing (ATH) (Fletcher and Munson, 1933)

  • The variation of what is perceived as equally loud at different frequencies was first measured by Fletcher and Munson at Bell Labs in the mid-1930s.

  • Humans ear better those sounds that contains audio signals with frequencies that ranges between 3 KHz and 4 KHz.

  • In lossy audio coding, where a quantizer controls the bit-rate, ff we select the quantizer step size such that the quantization noise lies below the audibility threshold, the noise will not be perceived.

  • The ATH shape depends on the noise level.

Frequency resolution and simultaneous (spectral) masking

  • The HAS has a limited frequency resolution. Psychoacoustic experiments have demonstrated that the audible frequencies can be grouped into barks.

  • Critical bands have a constant $Q$, which is the ratio of frequency to bandwidth. Thus, at low frequencies the critical band can have a bandwidth as low as 100 Hz, while at higher frequencies the bandwidth can be as large as 4 kHz.

  • Each bark defines the group of frequencies that excite the same cochlear area, i.e., those frequencies that can be masked by the tone with the highest energy (in that bark).

  • As a consequence of this behavior, simultanous sounds with similar frequencies will mask each other.

  • Therefore, a tone at a certain frequency will raise the threshold in at least, a critical band around that frequency.

  • The degree to which the threshold is increased depends on a variety of factors, including whether the signal is sinusoidal or atonal.

Temporal resolution and masking

  • The HAS has inertia. Sounds are not instantly perceived and remains (in our brain) for a while after they are disapered. Therefore, the HAS needs a minimum temporal separation between two sounds in order to distinguish a silence (independently of its frequencies).

  • Temporal masking occurs when the perception of one sound is inhibited by the presence of another sound.

Binaural perception (Sound localization)

  • Human beings possess two ears that are separated a certain distance (the diameter of the head). Therefore, the sound they receive is almost never exactly the same.

  • This fact is used by the HAS to locate the sound sources. For this it uses that:

    1. The level of sound intensity is always stronger in the ear that it is closer to the sound source.
    2. As the time it takes sound waves coming from a source to reach the two ears is slightly different, the brain is able to calculate the spatial location of the sound source.

  • Most audio codecs exploit this channel dependency by means of the joint stereo) mode, which encodes the $L$ and $R$ channels as

    \begin{equation} M = \displaystyle\frac{L+R}{2} \tag{"Mid" signal} \end{equation}

    \begin{equation} S = \displaystyle\frac{L-R}{2}. \tag{"Side" signal} \end{equation}

    This processing is similar to Dolby Stereo for creating the surround channel, except than the surround ($S$) signal is generated by delaying one of the channels a small amount of time that depends on the intensity of the surround.

Quantization

Basically, removes pure signals of low amplitude but taking also into account the SAM (pSycho Acoustic Model) of the HAS (Human Auditory System). Noise use to be of low power!

Lossy encoding

The limitations of human perception are incorporated into the compression process through the use of psychoacoustic models. Some of these limitations are physiological, based on the machinery of hearing. Others are psychological, based on how our brain processes auditory stimuli.