After signing up for the waitlist a few weeks ago, I just got access to openAI’s DALL-E 2 beta service. DALL-E is a machine-learning based image generator based on natural language processing. In other words, you type in a description of your desired image, and press the “generate” button. After a few seconds, you’ll see four images, as interpreted by the image generator which you then can further refine. DALL-E can generate astonishing results, if you need images like a Teddy bears mixing sparkling chemicals as mad scientists in a steampunk style as in this demo.
But what knows DALL-E about Audio Technology?
I used some of my 40 credits to find out. The first query was about spectrograms. Would DALL-E know how the spectrum of a sone tone look like? You’ll see the answers below. This doesn’t look right, does it?
OK, next I tried a few audio buzz words, hoping for some insights into DALL-E’s view of the world. You see some of my query results below. Most images reveal typical stereotypes known from stock photos. Looking at the “Immersive Audio” results, wearing flannel shirts and touching your headphones is essential for acoustic immersion.
I’ve designed the curriculum for the University of Erlangen-Nuremberg’s first Audio Processing for the Internet of Things course and will be teaching via Zoom. Details here. The course is only open to FAU students, but I’m open to sharing my resources with colleagues and students interested in this field.
While doing some background research for a future publication, I quickly checked the search trends for a few keywords. To my surprise, just a few days ago, one of my favorite keywords Spatial Audio had a spotlight moment not seen in 5 years:
A closer look revealed that the spike happened on June 23rd. What happened there that made so many people google for spatial audio?
The Apple Worldwide Developers Conference happened. It was announced that the AirPods Pro will get a big upgrade this fall—a “spatial audio” feature! I presume lots of tech savvy people were curious to find out more about spatial audio. Let’s see how this trend continues …
It’s an end of an era, and the beginning of a new one.
I started my first corporate research and engineering job straight out of a post-doc at UC Berkeley in 2013. We rented a minivan, packed up two years of student life, and took our time along Highway 1 as we made our way to San Diego and a completely different lifestyle. If you had told me back then that I would stay in southern California for seven years and become a dual German-American citizen, I wouldn’t have believed you. But life passes fast, colleagues at Qualcomm became friends, I finally learned to surf.
I want to say thank you to the guy who brought me to Qualcomm and San Diego, my manager and the director of the 3D audio team, Deep Sen, for believing in me and hiring me. I was employee number three on his team. Under Deep’s direction, our team grew to double digits and did amazing international work in the field of immersive audio. Deep moved on to Apple in 2018. And now it’s my turn to move on to new adventures.
This June, I became a professor at the University of Nuremberg-Erlangen for the International Audio Laboratories. I am developing a research program in spatial audio processing for the internet of things with funding for potentially two PhD students or post-doctoral researchers. Reach out to me if you’re interested.
The time I spent at Qualcomm is invaluable. I learned things I never could have learned in academia, and all that I have learned, I will bring to my students with the intent to develop the next generation of interdisciplinary researchers and engineers.
One more thing. I want to especially say thank you to my mentor Nik. And to all of my colleagues: I owe you all one last IPA.
Finally, there is an introductory book on the theory behind Ambisonics and its practical applications:
From the preface:
Despite the Ambisonic technology has been practiced in the academic world for quite some time, it is happening now that the recent ITU, MPEG-H, and ETSI standards firmly fix it into the production and media broadcasting world. What is more, Internet giants Google/YouTube recently recommended to use tools that have been well adopted from what the academic world is currently using. Last but most importantly, the boost given to the Ambisonic technology by recent advancements has been in usability [..] the usability increased by plugins integrating higher-order Ambisonic production in digital audio workstations or mixers. And this progress was a great motivation to write a book about the basics.
The book is dedicated to provide a deeper understanding of Ambisonic technologies, especially for but not limited to readers who are scientists, audio-system engineers, and audio recording engineers. As, from time to time, the underlying maths would get too long for practical readability, the book comes with a comprehensive appendix with the beautiful mathematical details.
This book closes a big gap – I am not aware of any existing comprehensive introductory literature on this topic. So finally, a book I can safely recommend to colleagues who want to learn more Ambisonics. Beside the hardcopy, Franz and Matthias also managed to make the ebook version available as open access!
Last week I finally received my Raspberry Boom I backed at kickstarter a few months ago. The Raspberry Boom is an infrasound measurement device for measuring inaudible low-frequencies in atmospheric and man-made events such as rocket launches, explosions, volcanic eruptions, storms, tornadoes, lightning, etc. The journal Physics Today just had an article on that topic.
At the moment the boom sits inside its housing on the window of my home office. I still need to find a permanent place to minimize the likelihood of it falling down and the influence of low-frequency wind turbulences.