What does DALL-E know about Audio Technology?

After signing up for the waitlist a few weeks ago, I just got access to openAI’s DALL-E 2 beta service. DALL-E is a machine-learning based image generator based on natural language processing. In other words, you type in a description of your desired image, and press the “generate” button. After a few seconds, you’ll see four images, as interpreted by the image generator which you then can further refine. DALL-E can generate astonishing results, if you need images like a Teddy bears mixing sparkling chemicals as mad scientists in a steampunk style as in this demo.

But what knows DALL-E about Audio Technology?

I used some of my 40 credits to find out. The first query was about spectrograms. Would DALL-E know how the spectrum of a sone tone look like? You’ll see the answers below. This doesn’t look right, does it?

DALL-E’s interpretation of “A spectrogram of a sine tone

OK, next I tried a few audio buzz words, hoping for some insights into DALL-E’s view of the world. You see some of my query results below. Most images reveal typical stereotypes known from stock photos. Looking at the “Immersive Audio” results, wearing flannel shirts and touching your headphones is essential for acoustic immersion.

Spatial Audio:

Immersive Audio:

Internet of Sound:

Concert in the Metaverse: