The jReality audio rendering pipeline
This document presents an overview of the audio architecture of jReality, based in part on Spatialisation - Stereo and Ambisonic by Richard W.E. Furse. Details can be found in other tutorials.
The audio rendering pipeline starts with a subclass of
AudioSource, which only needs to write mono samples to a circular buffer upon request, at a sample rate of its choosing. The audio backend collects all audio sources in the scene graph and requests samples from them when necessary. Audio sources are designed in such a way that a single audio source can appear in several places in the scene graph, and multiple backends can operate on the scene graph at the same time.
The audio backend creates one rendering pipeline for each occurrence of each audio source. In particular, all the optional processing described below (preprocessing, gain control, distance cues, etc.) can be chosen for each occurrence individually. They are defined through appearances, so that they can be changed on the fly. The class
AudioAttributes lists the tags under which those options are listed in appearances. For an example of how to use them, see
The audio backend will implicitly convert any input samples to the sample rate of the output device. After the sample rate conversion, the user may insert optional effects, such as early reflections or pitch shifting. Such effects are implemented as subclasses of
AudioProcessor. Multiple effects can be plugged in a chain of processors, implemented in the convenience class
After the initial processing, the audio signal is multiplied by a gain factor and the first (optional) distance cues are applied. A typical application of the first round of distance cues is distance-dependent low-pass filtering, making distant sound sources sound duller than nearby ones.
Now the signal is fed through a distance-dependent delay line, giving rise to physically accurate and subjectively convincing Doppler effects. The speed of sound defaults to 340m/s, but it can be set to arbitrary values through appearances. A speed of sound of zero or less is interpreted as infinite speed of sound, i.e., sound propagation is instantaneous in this case and Doppler effects do not occur.
Now the audio signal is split into two streams. One stream is sent through another optional distance cue (a typical example of a second distance cue is distance-dependent attenuation) and then sent on to an encoder that handles spatialization. In the simplest case, this encoder will compute a stereo signal, based on the relative position of source and microphone. In a 3D audiovisual environment, the encoder will compute an Ambisonics B-signal.
The second stream is intended for directionless processing. It is multiplied by a second gain factor and then fed through an optional effect (e.g., reverberation) that is then rendered without any spatial encoding. For an explanation of this design (directed and directionless stream, two sets of distance cues applied at different times), see Furse's paper cited above.
Finally, the directed and the directionless signal are added up and sent to the output device. In the case of stereo encoding, the output device may already be the physical output of the soundcard. In the case of Ambisonics encoding, the output device will be an Ambisonics decoder configured for the local speaker rig.