| Technical Papers Library | |
|
Creation Manipulation And Playback Of Soundfields
Abstract
The investigations carried out at Lake into various methods for generating simulated acoustic experiences has led recently to the methods of soundfield recording and playback. This paper explains the basic methods used in this area, and examines the various tools that have been developed at Lake for generating and manipulating soundfields. This research merges technologies from Architectural Acoustics (Auralization) and Virtual Reality (head-tracking and 3–D simulation).
Introduction
The refinement of acoustic computer modeling techniques in recent years has led to the development of DSP hardware that is capable of simulating complex acoustic responses in real-time, through the use of convolution. This simulation has generally centred around binaural playback, as a convenient method for presentation of a 3–D soundfield to the listener. The enormous computational effort required to pre-compute the binaural acoustic response of a space means that any animation of the source or listener position during the real-time DSP processing will require a new approach. Furthermore, the application of acoustic modelling techniques in new areas (such as entertainment) will require the facility for playback other than through headphones. This paper describes a new method for the creation of 3-D soundfields that makes use of a high-performance DSP system to allow animation of the acoustic simulation in real-time, whilst also allowing the option for playback through loudspeakers (in either a 2–D or 3–D surround array or through a stereo pair) or headphones. This paper is divided into six main sections. The purpose of the simulation system
The intention of this development at Lake was to produce a DSP system capable of giving a subject the illusion of a particular acoustic space, with one or more sound sources located within the space. The system was intended to fulfil the following requirements : Acoustic modelling and auralization
Acoustic modelling is used to determine attributes of an acoustic space based on a computer model. Auralization is the process by which a listener is able to listen to the soundfield that would be experienced in the actual space, based on the results of the modelling process. A good overview of Auralization is given in [ Auralization is accomplished by the following steps:
The impulse responses computed by this procedure can be very long. For audio data sampled at 48kHz, the impulse response will be between 10,000 samples (for a very small room) and 200,000 samples for a very large reverberant space. Some or all of the computation must be re-done if any of the following features are changed: (a) the position or orientation of the sound source (b) the position or orientation of the listener (c) the surface materials of the room (d) the microphone characteristic (or the dummy head response) (e) the headphone characteristics (in the case of binaural playback). In all cases where a dummy head is referred to in the above discussion, the authors intend that a real subject's head response may also be applied. In particular, the most realistic auralization experience can be achieved for a listener when the response of that listener's head is used in computing the impulse responses. The use of binaural impulse responses to simulate the room, with the full room response precomputed, generally implies that the auralization process will be static. However, an animated auralization has been achieved by the authors in the case where the impulse response of the room is precomputed for multiple listener head directions, and the DSP is pre-loaded with these multiple responses. In this system, a tracking device is attached to the headphones, and the DSP selects the appropriate pre-loaded filter function based on the listener's head orientation. However, this system requires that all room responses be pre-computed, so arbitrary movement or sources and receiver cannot be achieved. The B-format soundfield representation Binaural impulse responses present a particular difficulty, because the response heard by the listener (at each ear) changes in a very complex way when the source and/or listener move in the acoustic space. The authors sought a more convenient format for creating and manipulating soundfields, and selected the B-format soundfield representation as a proven and well supported system for recording and processing the soundfield measured at one point in space. The B-format is often referred to as Ambisonics. The authors' understanding of the terminology is that Ambisonics is the system of recording and playback of sound fields developed by Michael Gerzon at the National Research Development Corporation in the U.K. in the 1970's [ 3][4]. Other researchers also contributed to this technology [5], and Ambisonics differs from other work in the way the spatial sound is processed for loudspeaker playback. The B-format is the name given to the particular 4-channel recording and transmission format used to convey the spatial sound information that is used in Ambisonic systems (in fact there are other formats, including UHJ, that are subsets of the B-format used in situations where the media does not allow for four full-bandwidth audio channels).The B-format is essentially a four-channel audio format that can be recorded using a set of four coincident microphones arranged to provide one omni-directional channel (the W channel) and three figure-8 channels (the X, Y and Z channels). This set of X, Y, Z and W signals represent a first-order approximation to the soundfield at that point in space. The Ambisonic system was developed in the 1970's and the method of playback of soundfields over speakers never found wide application with end users (partially due to the fact that Ambisonics was introduced to the world at a time when quadraphonic sound was losing favour with consumers). Despite this, the technique of recording in B-format has been sustained by a number of practitioners who have found it very flexible, due to the ease with which recordings can be made, and the variety of ways that the B-format can be mixed-down to stereo. The B-format has been used as part of Lake's new AniScape software tools because it provides the following important benefits:
The SoundField Filter DSP process The DSP processing required to provide a listener with the illusion of sounds localised in a particular acoustic space must implement the following features (with reference to figure 1)
Figure 1. Early reflections in a room. The relative amplitude of the direct sound compared to the remainder of the room response helps to provide the sensation of distance. The DSP structure used to achieve this function is described below. Each input to the DSP represents one sound source in the virtual soundfield. The input is passed through a delay-line, and the direct sound and each early reflection is tapped out with the appropriate delay (based on the path length of the direct path or 1st order echo). Then, each of these sound path signals is attenuated and filtered to simulate the correct characteristic of the echo arrival. The factors that are included in this gain/filter stage are (a) overall attenuation due to the distance traversed by the echo path, (b) attenuation due to the directivity pattern of the source, and (c) frequency dependant attenuation due to the acoustic properties of all the surface materials encountered by the echo path. Each of these sound arrivals (direct sound plus six 1st order reflections) is mixed to form the X,Y,Z,W soundfield signals. The gain values used in this mixing are computed from the direction of arrival of each sound image at the listener position. Finally, an additional tap from the delay line is used to feed the sound input into an array of FIR filters that render the later room reflections (from 2nd order, right through to the end of the reverberant tail). This filter has an impulse response of several seconds in length. Each tap from the delay line uses interpolating filters to achieve sub-sample delay resolution. This, combined with an instantaneous velocity integrator, achieves a smooth Doppler shift as the sound objects (or listener) move within the acoustic space. The final output of the system is a set of four signals (the B-format). These signals simulate the B-format response that would have been recorded by a Soundfield Microphone when placed in the same listener position within the space, surrounded by the same sources. Loudspeaker playback of B-format signals Many references (including [ 3], [4], [5]) provide good explanations of the way that B-format can be played back over loudspeakers.Basically, the technique used simply feeds an array of speakers with a mix of the B-format signals. Generally, the speaker array must have some degree of symmetry (in fact the symmetry constraints can be very rigid if the decode is to perform the ‘mathematically ideal’ performance). The array should surround the listener, so that if speakers are placed in front, then some should also be in the rear, and if overhead speakers are used (to give an impression of sound elevation) then some speakers should also be placed below the listener. At low frequencies (below about 300Hz), the goal of the decoder is to take a B-format input and re-produce the same B-format soundfield at the central listening point. At higher frequencies (above about 700Hz), the mixing of the B-format signals to the speakers is adjusted to compensate for the fact that the listener’s head disrupts the soundfield. This adjustment is made using simple shelf filters that vary the high frequency contribution of the W (omnidirectional) component of the B-format relative to the X, Y and Z components. At Lake, we have implemented a B-format decoder using DSP techniques. It can drive an almost unlimited number of loudspeakers, and has been tested and demonstrated with an array of twelve. B-format playback over headphones More recently at Lake, a new DSP algorithm has been implemented to decode B-format signals for headphones. The first step in this process was to build a DSP function that filters the four B-format components, producing two outputs, in such a way that a static binaural presentation can be made of the B-format soundfield. The next step was to add a mixer that can rotate the X,Y,Z components of the soundfield prior to the binaural filters so that, in conjunction with a head-tracking device) the soundfield could be made to remain stable when the listener rotates his/her head. Tests so far have shown this technique to work very effectively. It has a number of distinct advantages over previous methods used to implement head-tracked animated spatial sound over headphones :
Conclusions The system described in this paper has been built by Lake, and found to offer the following benefits for users wishing to create and playback virtual soundfields:
The intention of the authors, when this system was originally specified, was to build a turn-key system for graphical simulation researchers to be able to build the acoustic equivalent of their real-time 3–D graphics displays. In future papers, the authors hope to be able to report on the experience of users who are applying this system in real applications. References
c. 1994, David McGrath, Lake DSP. | |
Announcements | Library | Forums | Commercial Links | Educational Links | Headphone FAQs