Although the statistics we investigated are relatively simple and were not hand-tuned to specific natural sounds, they produced compelling synthetic examples of many find more real-world textures. Listeners recognized the synthetic sounds nearly as well as their real-world counterparts. In contrast, sounds synthesized using representations distinct from those in biological auditory systems
generally did not sound as compelling. Our results suggest that the recognition of sound textures is based on statistics of modest complexity computed from the responses of the peripheral auditory system. These statistics likely reflect sensitivities of downstream neural populations. Sound textures and their synthesis thus provide a substrate for studying mid-level audition. Our investigations of sound texture were constrained by three sources of information: auditory physiology, natural sound statistics,
and perceptual experiments. We used the known structure of the early auditory system to construct the initial stages of our model and to constrain the choices of statistics. We then established the plausibility of different types http://www.selleckchem.com/products/ABT-263.html of statistics by verifying that they vary across natural sounds and could thus be useful for their recognition. Finally, we tested the perceptual importance of different texture statistics with experiments using synthetic sounds. Our model is based on a cascade of two filter banks (Figure 1) designed to replicate the tuning properties of neurons in early stages of the auditory system, from the cochlea through the thalamus. An incoming sound is first processed with a bank of 30 bandpass cochlear filters that decompose the sound waveform into acoustic frequency bands, mimicking the Ketanserin frequency selectivity of the cochlea. All subsequent processing is performed on the amplitude envelopes of these frequency bands. Amplitude envelopes can be extracted from cochlear responses with a low-pass filter and are believed to underlie many aspects
of peripheral auditory responses (Joris et al., 2004). When the envelopes are plotted in grayscale and arranged vertically, they form a spectrogram, a two-dimensional (time versus frequency) image commonly used for visual depiction of sound (e.g., Figure 2A). Perceptually, envelopes carry much of the important information in natural sounds (Gygi et al., 2004, Shannon et al., 1995 and Smith et al., 2002), and can be used to reconstruct signals that are perceptually indistinguishable from the original in which the envelopes were measured. Cochlear transduction of sound is also distinguished by amplitude compression (Ruggero, 1992)—the response to high intensity sounds is proportionally smaller than that to low intensity sounds, due to nonlinear, level-dependent amplification.