Positive Feedback Logo

Audio Engineering Society 2023 Conference in Helsinki

06-07-2023 | By Scott Dorsey | Issue 127

The Audio Engineering Society has been giving two conventions a year for decades, with an annual show that has traditionally moved between the east coast and west coast of the US in the fall, and a European show in the spring.

While the American AES conventions are split about evenly between begin academic conventions and commercial trade shows, the European conventions lean very very heavily on the academic side.

This year's convention at Aalto University in Espoo, Finland was even more so than usual. Whereas the 2019 show before Covid in New York had 142 papers (counting in e-briefs) and 213 vendors, this year's European show had 81 papers but only six vendors. To be honest, I really like the European show because it's not vendor-driven but it would have been nice to have a few more vendor tables. This is the absolute first time I have ever said that about any AES event. I was pleased to see local vendors, though.

I really enjoy the European shows because they are laid-back and relaxed and the food is always good, and I enjoyed this one even more than usual. Rather than talk about program items by subject, I am going to break things up a bit differently in this show writeup, beginning with papers.


Adam Pilch from the AGH University of Science and Technology in Krakow talked about "Sound diffusers optimized for listening rooms." He did some relatively simple models using the Fraunhofer equation to model effectiveness of diffusers in small rooms. Interestingly, he found that small QFD diffusers were not as effective as some other configurations (even random) ones because the diffusion was not constant with frequency for sound waves not normal to the source and QRD did not reduce specular reflections. So he devised some more random diffusers which were more effective at different angles, for both the rear wall and side walls where the signal sources were at an acute angle from the diffuser. He found these to work much better than the popular diffusers when tested in small rooms (and by better I mean diffusion was more constant with frequency).

Piotr Ksiazek also from the AGH University of Science and Technology gave a talk called "A deep learning based method for honeybee sound monitoring and analysis." He investigated determining temperature from the sounds made in beehives, much as we did with katydids as children but with a large mass of insects at once. He put microphones inside and outside of beehives, recorded 50 second chunks of audio which were MP3 encoded using a raspberry pi, and then did simple feature extraction on the encoded files, creating PSD (power spectral density) and MFCC (mel-frequency cepstrum) files which showed frequency vs. time.

He then put those spectrum files into a machine learning algorithm along with the current temperature. He got very good results across the board showing good correlation between PSD and temperature and slightly poorer correlation betwen MFCC and temperature although I couldn't figure out where the test dataset came from and if it overlapped with the training dataset in any way. The external microphone gave data with as good validity as the internal microphone, which is good because the bees attacked the internal microphones.

I gather this was in part a proof of concept and a first test of a data recording system. I expected that the MP3 encode/decode process which has zero waveform fidelity would be an issue, but whatever features were lost in the MP3 process turned out not to be necessary for the resulting model. I am pleased at how well this worked, but I do think that if the data is to be more general and to be used for other projects that the MP3 encoding is likely to become a problem in the future.

There were a few posters this year, posters being shorter papers which are put up in the form of a hall poster with the author being there to explain it. One worthwhile one was from Kotaro Kinoshita and Takehiro Sugimoto from NHK broadcasting in Japan. They set out to measure the radiation pattern of the human voice in three dimensions using a large microphone array surrounding a person speaking. The method was interesting. The database that resulted could be interpolated for simulating voices in object-oriented audio systems although the authors state that greater spatial resolution would have been useful. It is still a good start, and yes, the chest does in fact radiate substantial energy.

Kotaro Kinoshita

Konstantinos Kaleris and a number of other folks ending with John Mourjopoulos write about "Laser Sound Tranducttion from Digital Sigma Delta Streams." This a preliminary look at a small plasma driver that is not your father's ionophone! Each laser pulse in the air creates an N-wave impulse, and by modulating a Nd:YAG laser in pulse duration, wavelength, and focussing, the authors were able to create arbitrary acoustic waveforms at very high intensity in free air. The system has very limited fidelity in great part because of the limited rate at which the laser could be controlled; since the waveform is made up of individual pulses the pulse rate needs to be much higher than is currently possible to avoid aliasing. This is by no means a practical device but I could see it someday becoming a practical device. To my mind what makes this interesting is the possibility of perhaps scanning a far faster laser across the soundfield in order to reproduce a wavefront in 2d space some day.

Another strange new speaker concept was discussed by Michael Gareis and Juergen Mass in "Buckling Dielectric Elastomer Transducers as Loudspeakers." They describe a cup made of a silicone polymer with conductive graphite on either side so that an electrical charge will compress the cup or membrane electrostatically and cause it to buckle mechanically. The resulting driver requires DC offset and considerable voltage but is effectively a capacitive load. That's where the similiarity to conventional electrostatic drivers ends, though. Linearity is not great but efficiency is good and for a proof of concept this looks like an interesting beginning.

At one point, Brad McCoy took me into a paper session and then after a few minutes he ran off to see someone leaving me to watch his bag. I won't mention what the paper was but it was terrible and I was sitting there not able to leave for twenty minutes until he came back to rescue me from a depressing show of unclear thinking.

Juha Backman speaks

Helsinki's own Juha Backman spoke on "Current feedback techniques for loudspeaker control" where he pointed out some of the history on driving loudspeakers with negative impedance for control of damping. Driving with an amplifier that is a simple current source works well for a single driver as long as the impedance is made to rise around the driver resonance so all damping is not lost. He proposed a generic method for generating frequency-dependent power amplifier output impedances using current feedback with active equalization in the feedback network. This method can reduce distortion due to driver nonlinearity as well as thermal compression, while still being stable, but of course is only applicable to active speakers. He did mention in the presentation that an important step has been left out of the method in the paper but he then proceeded to describe it to the audience. This paper is well worth it to anyone who is interested in active speaker design but maybe you should ask the author for an updated copy.

Stephane Elmosnino's student paper, "A Review of Literature in Critical Listening Education" was a good survey of work that has been done in the past on teaching and learning perceptual and listening skills as well as skills for communicating about what has been heard." He came up with a number of interesting studies on ear-training methods that I had not heard of as well as mentioning all the classic studies.

Another great paper on critical listening education was "Introducing the Free Web Edition of the Perceptual Encoders - What to Listen For" educational material. Back in 2001 when perceptual encoding systems were new and shiny and people could claim MP3 encoding to be transparent without being laughed at, the AES Technical Committee on Coding came out with a CD-ROM which showed typical artifacts in various situations including in isolation along with material the explained how to listen for them. Fast forward 22 years and Juergen Herre, Sascha Dick, and the rest of the committee have come out with an online version that is interactive. It includes all the original material, plus additional samples and discussion about parametric encoding systems which didn't exist in 2001. It is worth trying out and worth sharing with your friends who are still advocating lossy compression in critical situations. Point your browser HERE and try it for yourself.

Juan Sierra from NYU presented a poster called "Is Oversampling Always the Solution?" He spoke about various approaches to reduce aliasing effects in software dynamics processing, citing Frank Foti's classic paper at the 1999 AES show that describes "digital grunge" caused by aliasing in dynamics processors for broadcast.

Of course, oversampling to produce higher bandwidth around the dynamics block can solve problems but at the expense of using many more CPU cycles. The author points out other ways to reduce possible effects including bandwidth reduction and limiting speed of control signals. Worth reading if you are interested in software compressors or limiters.

Madalina Nastasa from right there at Aalto University talked about "Auditory localisation of low-frequency sound sources." She and her co-authors played back test signals through speakers in a room deadened even down to low frequencies, with masking noise played back to prevent audibility of any harmonics that might be generated by imperfect drivers. She found very good localization with pure tones at 60Hz although the tests at lower frequencies used octave-filtered pink noise and contained components as high as 60Hz. But, we can say this study shows clear evidence of localization down to 60Hz.

Many people claim that "there is no bass imaging" and this is in part because room modes in reasonably-sized modes eliminate any real bass imaging, and in part because they confuse the omnidirectionality of the driver with the lack of spatial imaging (which is very incorrect). I have also heard arguments that "there is no bass imaging" because there is good evidence that 20Hz tones cannot be located by listeners. So it is very important to show what localization is possible and to determine the lowest frequencies where it remains possible and this paper is adding evidence toward completely determining that.

Tutorials and Workshops

 In addition to the formal papers, the AES presents a number of tutorials and workshops that aren't new research but have experts talking about existing research and how it might be applied. In the last few years these have increased a lot in number and that has been a great thing.

This year Eddy Brixen from DPA gave a talk called "Mics and bad vibes" in which he talked about microphone handling noise, where it comes from, and how it can be measured and quantified. Manufacturers don't actually give numbers on handling noise in part because of a lack of agreement on a standard way to measure it.

Bruce Black gave a talk called "Secrets of the Low Frequencies: Navigating the Quicksand" on low frequency distribution in small rooms and the difficulty of dealing with modes in typical small listening and mixing environments. Sadly I did not attend this because it was at the same time as the archiving talk, but this is the number one thing that I feel like I missed out on at the event.

Gabriele Bunkheila from Mathworks gave a talk called "Creating Audio Plugins with Matlab." This was a good introduction to using Matlab, which is a general purpose scientific programming language, for creating plugins that can be used in digital audio workstations. He talked about the use of Matlab for streaming data instead of for fixed-length vectors, about the process for automatically converting the interpreted Matlab code into compileable C++ code, and about the process of wrapping the C++ code to provide an API that is recognizeable as a plugin. The talk itself was great but the constant PA dropouts were annoying.

There was a combined panel called "Why are standards important for archiving multi-track and multi-channel" with Brad McCoy and Marina Bosi. Brad, retired from the Library of Congress, gave a general talk about archiving and archival issues for a general audience, but then Marina Bosi talked about the new MPAI Community, an organization which exists to provide standards on the use of artificial intelligence for the motion picture industry.

She spoke about AI applications for audio archiving specifically, such as watermarking, audio and video coding, context-based audio processing, and systems to synthesize speech in a person's voice given a sample of that person's speech, to be used to repair missing sections of audio when a script with the original wording is available.

but she also spoke at length about a system for AI analysis of audio tapes, with a video camera on a tape head so that both images of the tape and the audio on the tape could be inspected by a machine-learning model that would identify damaged sections of tape or interrupted audio, at the same time also identifying breaks in the program.

A thing I found kind of curious was a panel discussion called "Towards an Objective Understanding of High-End Audio." It wasn't what I had expected. In fact, the discussion really was about high-resolution audio, and it was split among a number of panelists although not evenly so. The first panelist, Milind Kunchur, gave an impassioned speech about the hearing system and pointed out that the brain looks at signals in both time and frequency domain at the same time. Consequently, he says, and with some neuroanatomy to back him up, the timing of a signal onset can be detected more precisely than looking at a 20 Khz bandlimited version of the signal, even though the ultrasonics cannot be heard in isolation. He pointed out accurately that onset transitions affect the timbre a lot and suggested that many imaging effects may be a consequence of small timing errors that would be lost in a signal bandlimited to 20 KHz. He was saying a lot of things and going in various different directions very quickly and I came away with an impression that he might have been crazy but might also have had something important to say. The problem is that I couldn't figure out exactly what it was. He really didn't have time to elaborate such a complex argument and even so he used two-thirds of the time available for the whole panel. So Hans van Mannen took over really before he had a chance to speak.

van Mannen spoke about how the use of feedback to reduce nonlinearity creates the need for wider bandwith within signal and feedback path alike, and he described a number of distortion processes much similar to some of the work done by Marshall Leach back in the 1970s. He points out very accurately that once a system becomes nonlinear it can no longer be described properly as a fourier transform and that the nonlinearity may extend required bandwidth. All of this is true although I do think it is a wide step to apply this to the hearing system.

Josh Reiss then used a brief amount of time at the end to describe a meta-analysis of existing studies about high resolution audio. Jamie Angus talked about group delay for a minute or so afterward.

I can't say that any of these convinced me of the need for high bandwidth audio reproduction but they did make me want to know more about it and they made me want to know much more about the developments in acoustical neuroanatomy in the past decade or so. I will be looking up Dr. Kunchur's paper "The Human Auditory System and Audio" which he repeatedly referenced, but I remain very skeptical of the importance of such small timing changes due to bandlimiting. Much of my skepticism has to do with the general difficulty in hearing group delay effects, even very extreme ones.

I'm not saying that people aren't hearing differences between converters at different rates, I am just saying that there are many other likely differences and that extreme claims require extreme proof. I am still hoping for that proof.

Richard King spoke on "LFE-Friend or Foe? The Pros and Cons of the Low Frequency Effect Channel." The LFE or ".1" channel in surround and Atmos systems provides a band-limited subwoofer channel for effects and is widely used for film mixes. However, with these formats now being used for music, the LFE channel is very frequently being used in those as well. As we come into an age where increasingly mixes are being folded down from surround and object-oriented formats for stereo and headphone playback, the way the LFE channel should be handled in the fold-down becomes problematic. A number of typical issues with common streaming services were described and some advantages were found in linear phase low-pass filtering of the LFE channel before summing. A number of common filters used for automatic fold-downs were described and most of them were not very good. Use of the LFE channel is therefore not currently recommended if it can be avoided since different fold-down algorithms in common use will result in dramatically different low frequency results in the mix.

How much of a change in reverberation is audible? That's a hard question because the brain accustoms itself to reverb quickly and so changes may not always be easy to hear. There are lots of other factors besides time, and one of them is overall loudness. In "Just noticeable reverberation difference at varying loudness levels," Florian Klein and others from the TU Ilmenau investigated how much playback level affects the ability to perceive changes in reverberation time and they indeed found higher levels makes it easier. Unfortunately they find many other things seem to influence that ability including listener experience and (not surprisingly) order of presentation.

Jamie giving Heyser Lecture

Heyser Lecture Audience

Heyser Lecture

The Heyser lecture is an annual event memorializing Richard Heyser, who was the man who made time domain spectroscopy for audio not only practical but also useful and popular. It's presented by someone whom the AES wishes to commemorate each year. I don't normally attend it even when they promise to be interesting talks, because invariably the evening that they are given is my only free evening away from the convention. This year, though, Jamie Angus-Whiteoak was giving a talk about the human hearing system entitled "The Ear Is not A Fourier Transformer" and as Jamie is in poor health and is reluctant to travel to the US due to the new restrictions on transgender people in many states, I figured future opportunities to see her talk might be limited.

Let me just say that Jamie is an absolutely amazing lecturer. Her talks about sigma-delta conversion in the early 1990s did more than anything else to spread the idea that sigma-delta (bitstream) converter problems were soluble and that they eliminated all of the serious issues with ladder converters. While I was doing my thesis on methods to slightly improve DC stability on ladder converters, she was going around telling people how to build converters for which DC stability wasn't even necessary. So she is a well-known person in the field and has done more to make the University of Salford known abroad than anyone else, as well as more to make sigma-delta methods known.

Her talk was an overview of how the ear works, not just the usual discussion of the cochlea separating frequencies by bands but also discussion of how the cochlea contains different kinds of hair cells for different uses, and how positive feedback was used within the neural paths directly after the ear. She then talked about Fourier methods for frequency decomposition, how windowing works, and how you can trade off time and frequency resolution when you're doing a DFT on a streamed signal in realtime. She then made the point that the equiripple filters that engineers often like because the mathematics is simple have very poor impulse responses and how the effects of that are audible because the ear does not decompose signals the same way. The ear is a sampling system, but unlike a CD player it is a non-uniform and quasi-random sampling system and this affects how we should think about it.

This was one of the best lectures I ever heard at an AES show. Jamie was pretty nervous, which is perfectly reasonable for someone who has been given the opportunity to give a lecture that is probably the most prestigious things an academic in the audio field can do. The PA guy failed to pad her headset mike with moleskin so there were noises throughout from her earrings. But I came away surprised at how much more is known about the auditory system than Thank you, Jamie.


Most of the vendors were local companies in Finland and I like that because it calls attention to small companies with interesting products that might otherwise disappear into the noise floor at a larger show.

Nokia Ozo was showing a number of "immersive voice" systems in which people could make a call over a cellphone and not just see and hear the person on the other and but also hear the environment around them in stereo, and have that environment change as they moved their head. They also demonstrated a multi-user conferening system with simulated binaural playback so each user appeared in a different place in space within the virtual room. Not an actual product yet, but something Nokia has been testing and certainly a very useful thing in the age of remote work and constant teleconferencing. I was surprised how much the stereo image made it possible to separate the person speaking from the noisy environment around them, when compared with a mono signal. Stereo imaging was not really very high fidelity but it was more than good enough for the application.

OEKsound was showing off their Soothe2 software, which was a plug-in that could determine resonant points in recordings empirically and then provided dynamic tools for the user to cut them. It didn't do anything that you can't do with a parametric equalizer and a compressor with a sidechain, but it did it more conveniently and faster. It was effective enough that I really was curious what algorithm was being used to determine resonances because it clearly wasn't just comparing spectra at high and low levels in the recording. Worth checking out as a great timesaver if you do a lot of cleanup work.

Pinguin was not an official vendor but they had a fellow sitting in the lobby talking about their audio metering software. They provide metering programs of various sorts both for standalone computer use or as plugins that integrate with your DAW software. Much handier than plugging external metering in and far more accurate and flexible than the metering installed in typical DAW software.

Gefell Microphones

Microtech Gefell did not have anyone from the factory but they did have both the UK and Finnish representatives manning the booth. We misse Udo, who is both a great microphone designer and also one of the world's leading Trabant racers (assuming anyone can be said to be leading in a Trabant race). They were showing off their whole line of microphones but I was pleased to see they are now making the MD 100 handheld stage dynamic, based on their excellent MD 300 broadcast microphone. The MD 300 is a very versatile microphone and quite good for studio vocals; I continue to be surprised that they do not promote it as a podcasting mike because it is so much better than what is sold into that market. I am hoping the MD 100 is as good and hope to try it out.

The UK representative for Gefell was Sound-Link, and they also sell the Gethain studio monitors in the UK. These speakers sadly have no US distribution that I can see and they are very easy to mix on, but were not on display.

Brandenburg Labs is a company set up by Karkheinz Brandenburg, formerly from the Fraunhofer IIS and the person most resposible for the MP3 standard. It's developing a number of products centered around virtual environments. They did a demo in which they had first created an acoustical model of a corner of the convention center lobby area and then put it into a simulation. Playing sounds out of two speakers they could switch the signal source between the speakers and a pair of headphones with a head tracking system and the effect was absolutely uncanny. I was unable to tell when they had switched from one to the other. The room acoustics in the area were very poor but yet they were able to effectively simulate those poor acoustics over headphones in a way indistinguishable from the real thing in a quick test. I'm not sure what people would need such a thing for but if you should be one of those people, it works very well.

Genelec Speaker Cookies for 45th Anniversary

But the absolute biggest thing at the event as far as vendors go was Genelec's introduction of the 8381A studio monitor. These are large powered studio monitors aimed for a market where the ability to play loud is sometimes more important than the ability to sound good. Even so, this was probably the best-sounding demo I have ever heard at an AES show and I would like to have heard my own material on these speakers but my schedule didn't make that possible. The Genelec folks pointed out that the sense of envelopment is partly a function of low frequency imaging and so this is very important. Consequently they started out with a 5-way design but with digital crossovers and crossover correction so the multiple crossover points would be as inoffensive as possible. Looking at it, there is a 2-way coaxial driver in the center, with four midrange drivers around it. Although it looks like this would be beamy due to the distance between the midrange drivers, the midrange drivers are highpassed at 500Hz so the wavelengths it is reproducing are large in comparison with the distance between drivers. The lowest frequency woofers are mechanically opposed so vibration cancels out, and although the low frequency chamber is ported as a bass-reflex system, digital control reduces hang-over and is claimed to produce good overall impulse response. Are they as good as their $28,000 individual price would demand? I don't know, but I'd be willing to listen to them again and that's saying far more than I would say about most monitors I have heard demoed at shows.

Genelec Speaker Introduction


The convention audio was interesting to say the least, with Sennheiser MKE 600 short shotguns used on presenters and Genelec studio monitors used for room reinforcement. In the two smaller paper rooms, the audio was actually quite transparent and listenable with the levels being kept to a point where the PA system was not intrusive. The larger rooms all had problems, either with noises or dropouts, excessive volumes, or the wrong microphones being used. There were many problems with laptops, either because people were not setting them up properly or because of ground loop noise in the audio. Next time they should get Per Lundahl to provide some noise isolator boxes.


In spite of my luggage being lost in London while I was routed there on my trip from Stockholm to Helsinki (as I could not get a direct flight due to a hockey game causing all available seats to be booked), I still enjoyed the show a lot. Helsinki was lovely even though once again they failed to win the Eurovision Song Contest which took place while I was there.