Sonic Processing - Sound and Music in Visuals

From Interactivity

Taught for Openlab Workshops originally at http://openlabworkshops.org/workshop-space-studios-17-sept-2009/

Contents

What do we expect the computer to do for us?

What?

  • Beat detection
  • Identify vocals, instruments
  • Respond to sound volume
  • anything else?


Why?

  • automate parts of our set
  • mesh with music (timing, coloring)
    • explore relationships between musical events and visual events instantaneously
  • free us up to have a beer during the performance


Why do we want beat detection?

Are these goals achievable?


What goals are achievable?

  • filtering
  • using volume to animate
  • using frequencies to animate (FFT)
  • animating based on a time difference (a beat / BPM)
  • exponential vs. linear
    • animations
      • move by velocity (push object across the screen)
      • move along a curve (object moves along a set path)
      • ease-in, ease-out


What is Music and Sound?

  • Music is the product of a human action.
  • Sound is the result of an event.

Sound is physical - something must happen to make it occur!

Vision is passive - light and radiation are always bouncing around us; we absorb and process it.

Sound gives a sense of space and composition, density.

Vision senses texture, distance, forms.

Sound is directly, physically represented in the brain, similar to the way luminosity is perceived at a very low level.

Bass frequencies (120Hz and below) bend around corners and penetrate deeply into walls and objects; treble (1000Hz and above) frequencies bounce off objects and are more readily absorbed.


What information do we get out of sounds?

Information about the world around us.


  • What type of material was struck
  • The dimensions of it - how long, how wide, whether the shape was regular or irregular
  • How hard it was struck
  • How far away it is
  • What the world is like around it - echoes and reflections off other nearby objects


How Do We Physically Sense Sounds?

Hitting an object makes it vibrate in complex ways.

rare_compress.jpg

(from http://www.allegropianoworks.com/piano_tuning.htm)


These vibrations in the material push and pull against the air, causing it to vibrate at the same frequency (no sound in a vacuum!). The changing pressure on our eardrums is mechanically changes into electrical signals by the small bones and cilia (hair cells) in our ears.



Analyzing Sound

If you look at a sounds as a graph of volume over time, it looks like a squiggle.

(examples/libraries/Minim - LoadFile example)

Image:Soundwave 01.jpg

We need a way to analyze this squiggle and compare it to other squiggles. We need maths.

An FFT looks at an audio signal and breaks it down into a collection of sine waves, called a spectrum. Adding them all together forms a curve approximately like the original signal.


Huh? Just what is an FFT?

Fourier showed you can add together a collection of sine waves to recreate a recorded sound (or other recorded data). Each sine wave has a different amplitude (height) and period (distance between peaks). This is called the Discrete Fourier Transform, or DFT.

Image:Sine-waves-added.png

code: FFTExplained

FFT stands for Fast Fourier Transform - its a type of DFT, but fast enough that our computers can calculate it in a usefully short amount of time.


This is actually what happens in our ears! A sound physically vibrates the cilia (hair cells) at different frequencies, creating electrical impulses that are picked up by neurons in the brain. If you stuck a wire in your head attached to the neurons, and plugged the other end into an amplifier, you'd hear the sounds you were listening to. (They did this with barn owls)

bigcilia2.jpg


Perceiving Sounds as Coming from Objects and Events

The idea is that when you strike an object, you create a bunch of separate vibrations in it that add together to form a single sound. The peaks you see in the spectral envelope are there because the object has a physical tendency to vibrate (resonate) in certain ways, but not in others.

Every material has a characteristic "fingerprint" of frequencies that it emits when struck, no matter what the length. This is shown in the spectral envelope. People's voices have individual spectral envelopes as well.

The relative distance between the peaks, the number of peaks, and their height are all important indicators of what produced the sound.

The relationships between the frequencies are important - whole-number multiples of frequencies, heard together, are recognized as coming from a single source. This is subconscious behavior and impossible to ignore.

Image:Spectral-envelope-clarinet.png

Also - Pitched vs un-pitched sounds


Time matters too. The time between sounds, and how often and regularly that they occur (rhythm, beat) make a big difference in how we perceive what's happening.


FFT's are easy - identifying sound sources is very hard, and identifying beats and rhythms is amazingly complex. Not many animals can do this! (Parrots and some large birds can do it, but not always well). It takes previous training and a deep understanding of the music and sounds being listened to.


What's better than an FFT?

Or, why an FFT gives you crap data.

Take Flight404's word for it, there are other ways of getting interesting visuals out of audio.

Why not use some sort of | tap tempo control?


Timbre, rhythm, height, Chroma (circularity of sound - octaves)

Musical knowledge can help us here. The caveat is that some sounds aren't pitched (drums, cymbals) or don't follow specific musical conventions (such as the Western 12-tone scale).


Ok, so what can we get out of an FFT?

We can experiment with changes in the spectrum, and different, more realistic ways of scaling the data:


Math Diversion: Logarithmically vs Linearly

Image:Linear vs Log.png

Code: Linear_vs_Log

Code: FFTAnalysis


The Gestalt of Sound

"Some things naturally belong together"


Gestalt Laws:

  • Closure
  • Proximity
  • Continuation (Smooth forms)
  • Symmetry (balance of form)
  • Similarity
  • Common Center (Circles joined together)


Practical Issues with Live Sound

  • White noise contains all frequencies of sound, and will royally f*** up your sound analysis. Make sure all sound levels are checked before a performance, and do not go too high (clip).
  • Use a good mic
  • Use a good, external sound device with a decent ground and a good input.


Summing Up

  • filtering
  • using volume to animate
  • using frequencies to animate (FFT)
  • animating based on a time difference (a beat / BPM)
  • exponential vs. linear
    • animations
      • move by velocity (push object across the screen)
      • move along a curve (object moves along a set path)
      • ease-in, ease-out



Sources:

[1] Diemo Schwarz. Spectral Envelopes in Sound Analysis and Synthesis. Diplomarbeit, Universität Stuttgart, Fakultät Informatik, Germany, 1998.

Further reading/viewing:

Book: Auditory Scene Analysis by Albert S. Bregman

Book: Laws of Seeing by Wolfgang Metzger, Translated by Lothar Spillmann

http://www.hhmi.org/senses/c110.html

This is Your Brain on Music by Daniel J. Levitin

On-going human evolution for spoken language? http://languagelog.ldc.upenn.edu/nll/?p=555

Youtube: Frostie Dancing To Shake Your Tail Feather! Bird Loves Ray Charles! http://www.youtube.com/watch?v=0bt9xBuGWgw