| Technical Papers Library | |
|
High Resolution Simulation Of Acoustic Environments
Abstract
This paper presents an overview of the important issues that need to be addressed in a high quality audio simulation system for use in Virtual Reality. Some aspects of room acoustics are discussed, followed by a discussion of the methods used to present 3-D audio material to a subject, including headphones and multi-speaker playback methods. Computer synthesis of room acoustic responses is reviewed, with emphasis on its application in Virtual Reality. Finally, a number of example simulation systems, based on existing DSP hardware, are presented as a guide to the current state of the art.
Introduction
Acoustic environment simulation involves the presentation of audio material to a subject in a way that creates the impression that the subject is in a different environment. The experience will obviously be more realistic if the subject is also presented with other sensory (eg. visual) input that matches the acoustic environment.
The simulation of acoustic environments is a technique that has been used for a number of years in the field of acoustics. The ability to listen to a simulated acoustic space has enabled researchers in acoustics to examine the characteristics of a virtual or real acoustic space via computer modelling. Initially, simulation was used as a powerful tool to test the accuracy of the computer model. As computer modelling techniques have been refined and proven, the quality and accuracy of simulation has improved to the point where it is now being used as a reliable means of predicting the characteristics of an acoustic space that does not exist in reality. In the field of acoustics, this method of simulation has become known as Auralization [1], being the auditory counterpart to visualisation.
Simulation of an acoustic environment involves one or more of the following functions :
Part 2 of this paper reviews the characteristics of acoustic spaces and discusses which elements of a room response are important for different virtual reality applications. Part 3 discusses the processes involved in presenting audio material to a subject to achieve an impression of a synthetic or real acoustic space. The basic principles of 3-D sound localisation are reviewed, and some methods of audio playback are discussed. Part 4 describes some methods that are used in the computation of synthetic room responses for the purpose of simulating virtual spaces. Part 5 describes techniques that are currently used for playback of audio material with 3-D characteristics and the hardware that can be used to process dry audio material prior to playback. Room characteristics
Before we can attempt to simulate the listening experience of a subject in a given acoustic space, it is important to understand what aspects of the listening experience we wish to recreate. Ignoring extraneous effects, such as back-ground noise, Figure 1 below shows the three main components of the audio that are important for the listener in a typical acoustic environment : The sound arriving at the listener’s location in an acoustic space is composed of these three components: If any of these three elements of the room response are omitted or improperly simulated in an acoustic simulation system, the result will sound unnatural. When a subject is presented with a sound that is arriving from a specific direction, the human auditory system is capable of resolving the direction of arrival of the sound, based on a number of acoustic cues. Figure 2 illustrates a simple principle, namely that the use of two ears allows the left/right location of sounds to be estimated by a subject. This is based on the fact that the path length for sound arrivals at two ears will be different.
In reality, the methods used by the auditory system to resolve sound location is quite complex. The mechanisms used to locate sounds include the following [3] :
The sound received in the ear canals of a subject can be measured (using small implanted microphones) and the relationship between the transmitted sound and this received sound can be analysed. The response from transmitter to ear can generally be modelled as a linear filter with an impulse response that is less than about 2ms in length. Figure 3, shows the relative delay and attenuation characteristics of the filter responses for the left and right ears.
The signal measured at the ear-canal in response to a source signal that is an impulse is known as the Head Related Transfer Function (HRTF). The HRTF is an impulse response that varies as a function of azimuth and elevation of the source, and also varies between different subjects.
If we generate a test signal and measure the binaural response to that signal, then we can compute the HRTF of the head, for that particular location of the sound source relative to the subject’s head position. This HRTF measurement procedure is generally carried out in an anechoic environment, so that echoes within the room do not come into play. Alternatively, for measurements made within a normal room, only the first 2 or 3 ms of the measured response should be used, as any subsequent elements in the measured response will be due to extraneous echoes. Playback of binaural recordings
Recreation of an acoustic experience can be achieved by recording the sound that is incident on a subject’s ear-canal and then replaying that recording at a later time over headphones. Figure 4 illustrates this process. The head used to record the binaural material (the left-hand head in Figure 4) may be a real person’s head with small microphones implanted in the ear canal, or a dummy head such as the Kemar.
The 3-D experience that is reproduced by this technique includes more than just the direct arrival of the source sound. All aspects of the room acoustics are contained in this binaural recording, including the direction of arrival of early reflections and the fine details of the reverberant tail of the room response. Simulation of binaural effects
Binaural simulation is generally carried out using dry source material. Dry recording are made in an anechoic chamber, to ensure that the recording does not contain any unwanted echoes.
Dry source material can then be replayed to a subject, using the appropriate HRTF filters, to create the illusion that the source audio is originating from a particular direction. The HRTF filtering is achieved by simply convolving the dry audio signal with the pair of HRTF responses (one HRTF filter for each channel of the headphone). Figure 5 shows this procedure.
The HRTF(l) and HRTF(r) responses are commonly expressed in term of time-domain impulse responses. With audio signals sampled at a typical rate of 48KHz, these HRTF responses are usually around 128 samples in size (corresponding to about 3ms of impulse response). The convolution process is defined by the following equations :
where x(n) is the input audio stream (a monaural signal), and y Practically speaking, this binaural simulation system could be accomplished by a DSP processor, taking the input data in real time and producing the (stereo) output data by computing the convolution based on the equations above. Typical binaural simulation systems store a large number of pre-measured HRTF functions, and can switch from one HRTF to another rapidly. For any given location of the source audio, the HRTF can be retrieved from this stored table, or computed by interpolating between closest neighbour stored HRTFs. The binaural simulation method described above attempts to create the illusion of a sound source that is located some distance from the subject, in a particular direction. The short HRTF filters mimic the propagation of the source sound to the subject without including any room acoustic characteristics.
The HRTF measurement method can be used in a reverberant space (instead of an anechoic chamber) the measure a pair of filter responses (for left and right ears). In this case, the response being measured is not a pair of HRTFs, it is a binaural room response. Simulated binaural playback can be achieved by the same method as shown in Figure 5, giving the subject the illusion of the source audio being transmitted within the same acoustic space that the binaural response was measured in. Deficiencies in binaural playback Some deficiencies in binaural playback have been reported in the past, these include:
Simulation using multi-speaker playback A different approach to acoustic environment simulation involves a recreation of the 3-D sound field around the subject. This is most often achieved through the use of a large number of loudspeakers placed in an array around the subject. Generally, a minimum of four loudspeakers are required to achieve a convincing 3-D audio experience, while some researchers are using twenty or more speakers in an anechoic chamber to recreate acoustic environments with much greater precision. The main advantages of multi-speaker playback are:
Whilst the playback through multiple loudspeakers may be more convincing for many subjects than headphone playback, the processing required for simulation of the 3-D audio will be more expensive (simply because more channels of audio output are required). In addition, making 3-D recordings in a real environment, for later playback over multiple loudspeakers, is a difficult procedure. High-precision simulation of acoustic spaces, for the purpose of evaluation of acoustics, is almost always performed using either binaural playback or else using a large number (>10) of loudspeakers in an anechoic chamber. However, the use of far fewer speakers (4 or 6) can still provide a fairly realistic 3-D experience, and active research in this area is progressing rapidly. Also, 4 or 6 speaker systems have application in Virtual Reality. These systems can be constructed with the speakers close to the subject, for compactness. Larger numbers of loudspeakers will provide a larger ‘sweet-spot’, the range within which the subject (or subjects) can move without compromising the fidelity of the 3-D simulation. Head tracking and animated effects Improved playback through headphones can be achieved through the use of head tracking. This technique makes use of continuous measurements of the orientation of a subject’s head, and adapts the audio signals being fed to the headphones appropriately. From the simplified model presented in Section 3.1 (Figure 2), it is clear that the use of 2 ears should allow a subject to easily discriminate between left and right sound source locations. However, the ability to discriminate between front and back, and high and low sound sources is generally only possible if head movement is permitted. Whilst multiple speaker playback methods solve this problem to a large degree, there are still many applications where headphone playback is preferable, and head tracking can then be used as a valuable tool for improving the quality of the 3-D playback. The simplest form of head tracking binaural system is one which simply simulates anechoic HRTFs, and changes the HRTF functions rapidly in response to the subjects head movements. This HRTF switching can be achieved through a lookup table, with interpolation used to resolve angles that are not represented in the HRTF table. Simulation of room acoustics over headphones with head tracking becomes more difficult because the direction of arrival of the early reflections is also important in making the result sound realistic. Many researchers believe that the echoes in the reverberant tail of the room response are generally so diffuse that there is no requirement for this part of the room response to be tracked with the subject’s head movements. An important feature of any head tracking playback system is the delay from the subject head movement to the change in the audio response at the headphones. If this delay is excessive, the subject can experience a form of virtual motion sickness and general disorientation. Synthetic room calculations The propagation of sound from a source to a subject within an acoustic space can be modelled by a computer, using a variety of methods. The methods used today fall into two broad categories:
The choice of modelling method used in a particular application will depend on the desired accuracy of the model and the animation capabilities required, as well as the techniques used for implementing the actual simulation for playback to the subject. Example simulation systems The acoustic simulation system takes the room response (either from a measured real room, or from a computer model) and convolves it with the dry source material. Most acoustic simulation systems fall into one of the following categories:
Long convolution engines such as Lake DSP’s Huron are being improved to allow faster animation functions to be added to their capabilities. The goal for these simulation systems is to be able to provide a realistic sounding room simulation whilst allowing rapid animation of source and subject location and room characteristics.
References
c. 1995, David McGrath, Lake DSP. | |
Announcements | Library | Forums | Commercial Links | Educational Links | Headphone FAQs