Immersive Audio: The Past Repackaged or a New Frontier?
Perspectives
By guest contributor Mark Willsher
Last update September, 25 2023
NAE Perspectives offer practitioners, scholars, and policy leaders a platform to comment on developments and issues relating to engineering.
Mark Willsher is an independent scoring mixer and music editor based in London, U.K.
Immersive audio is a term that, in today’s world, gets used to describe almost any audio presentation that is not traditional stereo (or mono). According to some, listening to stereo is like looking through a window, whereas immersive audio places the listener in the same space as the performer. Today, you might be most familiar with immersive audio from movies featuring soundtracks in Dolby Atmos or Apple Music’s “Spatial Audio.” Immersive audio, however, is not a new idea; many commercial formats that have rivaled stereo have been proposed and promoted over the past half-century. But, so far, none have achieved large-scale consumer adoption. As a music recording producer and engineer who has spent the majority of his career working in both stereo and immersive formats, it is difficult for me to look at the current marketing of immersive audio and not ask: What’s different now?
Looking at today’s offerings, it is easy to get confused. People use the terms “surround,” “spatial,” “immersive,” and “binaural” almost interchangeably. Surround tends to be passé and is generally used for older channel-based formats such as 5.1 and 7.1. Immersive and spatial are the terms most people use to describe newer formats, of which there are many — perhaps too many. Binaural, for our purposes, simply means using headphones to listen to immersive formats.
When I was still a student in the mid-‘90s, we were all excited about home theater and 5.1 surround sound. We were aware of the failed attempts to bring immersive music to the consumer with quadraphonic sound (4.0) in the 1970s, but what was exciting in the 1990s (so we were told — or, perhaps I should say, sold) was that consumers across the country were purchasing and installing home theater systems to watch movies at home. Those systems included 5.1 surround sound, meaning there were a large (and supposedly growing) number of households equipped to experience music immersively. All we had to do was make it available.
Why were we, as music producers and recording engineers, excited about this in the first place? Well, the key here lies in the term immersive. Personally (and I don’t believe for a second that I am atypical), I always hope to make recordings that engage the listener in such a way that the recording technology, the playback system, and the playback environment fade away, and the listener becomes emotionally absorbed in the music. Immersive audio makes that possible. That said, the term immersive could use clarification: While many view immersive audio as immersing the listener in sound (i.e., sound coming from all around you), I view it as immersing the listener in music (which may or may not be coming from all around you) such that the listener’s environment neither distracts nor detracts from the experience. As an example, it can be very emotionally powerful to shift all the sound to a single speaker to draw the listener in before something exciting happens — silence and a small click in a single speaker before an explosion that fills the room.
In the 1990s, it seemed plausible that 5.1 surround-sound music would succeed where quadraphonic sound had failed two decades earlier. Soon we had a new audio-only DVD format (DVD-A), but it required its own player, as DVD-As did not play in DVD-V players, which is what people were buying to watch movies. We could release 5.1 music on a video disc, which many did, but that was data compressed and didn’t exactly spark joy for the artist or the consumer. We also had the Super Audio CD (SACD) format, which seemed like a brilliant solution: a single disc with multiple layers — a standard audio CD layer that could be played on any CD player and a high-resolution Direct Stream Digital (DSD) layer that could be played on an SACD player. The DSD layer had eight channels, providing both stereo and 5.1 audio. I was involved in a small number of SACD productions (and a single DVD-A one), and I never met anyone who listened to the 5.1 audio. I’m sure there were some, but it was certainly a very niche market.
I have always recorded music in a manner that would allow me to create a proper immersive release, but prior to 2021 (with the exception of film music [music in the films themselves, not the soundtrack albums]) only a handful of projects I have been involved in were ever released in anything other than stereo.
We may have fancier tools and formats today, but we have been making immersive recordings for decades. Binaural reproduction (immersive audio specifically for headphone playback) has been done since the dawn of stereo in the late nineteenth century. There was (and probably still is) an impressive binaurally produced audio tour of the Alcatraz prison in San Francisco Bay. I had the joy of experiencing it in the early 2000s. This audio tour alone makes a great case for everything that immersive sound can be. I felt that I was wandering around the prison when it was filled with staff and inmates, and the audio tour was cleverly produced to keep you looking in the direction needed to make everything believable (no head tracking back then). Not to mention Ambisonics (too big a topic to go into here), which has been around for decades, and even 5.1 and 7.1, which can all be incredibly immersive. None of these are new.
So, what’s different now?
Things changed in 2021 when Apple announced that they would be offering Dolby Atmos-encoded content through Apple Music. It would be available automatically to those with Apple earbuds and Atmos-equipped hardware (home theater for the most part), and it could be enabled for all other headphone/earbud users. This meant there was now a major streaming distributor supporting immersive content and specifically supporting it for headphone playback — a much bigger audience than those with a home theater. Apple was not the first to do this, but the weight of its brand garnered significant attention from the public and the record labels, who, in short order, started re-releasing their catalogs in Dolby Atmos.
Another difference today is that formats such as Dolby Atmos and MPEG-H — two of many formats — are rendered at the point of playback to suit the listener’s playback setup. In the past, we could always make and deliver immersive binaural recordings for headphone playback, but then we had to create and deliver multiple versions, and the end listener had to select the appropriate one, which is far from user-friendly.
Today, there is a large catalog of immersive audio content available from mainstream distributors, and there is technology that provides immersive playback on a wide range of consumer devices and systems. These significant differences between what is happening today and what happened in the past explain why producers and engineers are excited about immersive audio.
Despite the recent advancements, immersive audio in 2023 is the Wild West.
We have the potential to produce content once and have it rendered on playback for the end user’s system. However, in reality this is hit-or-miss. In many cases, those of us producing content are seeing an ever-growing list of required deliverables. It should be noted that not all platforms support immersive content, so no one is letting go of stereo yet. In practice, most new recordings are still done in stereo, and the immersive release is created from stereo stems made during the creation of the stereo master. Interesting work is being done but is limited in potential, as all the initial creative work is done in stereo, and the immersive version is an afterthought — often developed without the original artist present.
This is not so different from what happened with film when Dolby Atmos was first released: Many films were still mixed in 7.1, and the Dolby Atmos version was done after the main mix. It didn’t take long before most films destined for Atmos releases were simply mixed in the format from the start. This can be expected to be a longer transition period for music, since, in most cases, people and facilities are moving from stereo to Atmos, which is a much bigger mental shift and more expensive than moving from 5.1 to 7.1.
Classical music and film scores are a bit of an outlier here. For the former, quite a few labels have been releasing music in various immersive formats for years: binaural headphone releases, 5.1, 7.1, and beyond. Many classical labels of the new consumer-facing formats provide a new means to deliver content they were already creating and releasing but to a much bigger audience (as specialist playback equipment is no longer a requirement). For film scores, while the vast majority of soundtrack releases have been stereo only, most production work has been done in 5.1 since the 1990s (not to exclude work done in earlier surround formats). For film score releases, we can now finally release the music in a format similar to what it was produced in.
Much of this is good news. And, while a transitional period is natural, there are some areas of concern that need to be addressed in order for immersive audio to be a tangible benefit for a majority of end listeners and, consequently, a real success for both artists and the music industry at large.
One disappointing area is catalog re-releases, which seem to be quite hit-or-miss in terms of both technical and creative quality. There is some amazing work being done, but there is also work being done that doesn’t benefit the consumer or the artist. In many cases, these re-releases are done without the involvement of the artist or the original producer/engineer (this is, of course, inevitable if people are no longer with us, but many are and are not consulted). Labels could slow down a little and allow people to concentrate on quality rather than quantity. Also, not everything needs or is appropriate for an immersive re-release.
It is amazing to have all these new tools and options, but they should be at the service of the content; they don’t all have to be used all the time! For instance, if someone is happier with their lead vocal as a phantom center (playing out of the left and right speakers and not a center channel), then they should not feel obliged to do anything different. I have and will continue to do both, depending on the situation. The same goes for some of the newer options: height/ceiling channels, moving objects, binaural renderer settings, and the list goes on. At the moment, due in part to good intentions, many labels and distributors impose restrictions on what can be accepted as an immersive release, which for the most part is content in a Dolby Atmos container (I call it a container as you don’t have to mix in Dolby Atmos to release in Dolby Atmos). Some labels require height channels, some labels require objects, and some set restrictions on binaural metadata. There may be best practices, and there are always bad choices that can be made, but we should not be required to use something for the sake of it. This does not lead to a better product; we do not need sound everywhere in order for the experience to be immersive.
Another unfortunate reality is that of data compression. For forty years, many have been bemoaning the quality of CD audio. In the late 1990s, CD sales plummeted with the rise of file sharing (low-quality, lossy-compressed audio), followed by the rise of digital sales and streaming distributors. Since then, the quality level has come back up to where most platforms now support 24bit audio, often marketed as the artist’s true intent or studio quality. Yet, now with immersive streaming, we are back to significant data compression: In the case of Dolby Atmos, our multichannel immersive deliverables are squeezed into a bit rate equivalent to that of a stereo CD. What consumers hear at home can be worlds apart from what is heard in the studio — not that a sizeable difference between the studio and home is anything new, but it is nonetheless disappointing in 2023.
One hurdle at this point is getting the cost of production down. For me, since I work in multichannel formats for film anyway, making the shift to immersive production for music-only projects was negligible. However, for an independent engineer (or studio) currently working in stereo, the jump to immersive production requires significant capital, for which there is really no additional return in terms of budgets. On the positive side, though, many new virtual tools are being developed and released for producing immersive content over headphones rather than requiring an expensive listening room. This means that a new generation of content creators will be able to do incredible things at a fraction of the cost. In fact, the cost of entry is already cheaper than it has ever been.
We are not there yet, but we are definitely heading in the right direction. After failed attempts in the 1970s and 1990s, immersive music production and consumption are finally here to stay. The user experience will also get better over time, even for content released today, much like Apple Digital Masters: You deliver the highest quality audio you have, and as the codecs get better and bit rates increase, everything can be re-encoded, and consumers of the future can benefit from improved quality. While there are hurdles to overcome, the future looks bright for both those creating content and those consuming it.
Disclaimer
The views expressed in this perspective are those of the author and not necessarily of the author’s organizations, the National Academy of Engineering (NAE), or the National Academies of Sciences, Engineering, and Medicine (the National Academies). This perspective is intended to help inform and stimulate discussion. It is not a report of the NAE or the National Academies.
© National Academy of Sciences. All rights reserved.