The variation of what is perceived as equally loud at different frequencies was first measured by Fletcher and Munson at Bell Labs in the mid-1930s.
Humans ear better those sounds that contains audio signals with frequencies that ranges between 3 KHz and 4 KHz.
In lossy audio coding, where a quantizer controls the bit-rate, ff we select the quantizer step size such that the quantization noise lies below the audibility threshold, the noise will not be perceived.
The ATH shape depends on the noise level.
The HAS has a limited frequency resolution. Psychoacoustic experiments have demonstrated that the audible frequencies can be grouped into barks.
Critical bands have a constant $Q$, which is the ratio of frequency to bandwidth. Thus, at low frequencies the critical band can have a bandwidth as low as 100 Hz, while at higher frequencies the bandwidth can be as large as 4 kHz.
Each bark defines the group of frequencies that excite the same cochlear area, i.e., those frequencies that can be masked by the tone with the highest energy (in that bark).
As a consequence of this behavior, simultanous sounds with similar frequencies will mask each other.
Therefore, a tone at a certain frequency will raise the threshold in at least, a critical band around that frequency.
The degree to which the threshold is increased depends on a variety of factors, including whether the signal is sinusoidal or atonal.
Human beings possess two ears that are separated a certain distance (the diameter of the head). Therefore, the sound they receive is almost never exactly the same.
This fact is used by the HAS to locate the sound sources. For this it uses that:
Most audio codecs exploit this channel dependency by means of the joint stereo) mode, which encodes the $L$ and $R$ channels as
\begin{equation} M = \displaystyle\frac{L+R}{2} \tag{"Mid" signal} \end{equation}
\begin{equation} S = \displaystyle\frac{L-R}{2}. \tag{"Side" signal} \end{equation}
This processing is similar to Dolby Stereo for creating the surround channel, except than the surround ($S$) signal is generated by delaying one of the channels a small amount of time that depends on the intensity of the surround.
Basically, removes pure signals of low amplitude but taking also into account the SAM (pSycho Acoustic Model) of the HAS (Human Auditory System). Noise use to be of low power!
The limitations of human perception are incorporated into the compression process through the use of psychoacoustic models. Some of these limitations are physiological, based on the machinery of hearing. Others are psychological, based on how our brain processes auditory stimuli.