‘I am sitting in a room different from the one you are in now’

During last week’s group meeting, Pradheep Shanmugalingam, one of the lab’s PhD students, mentioned that he was working on an experiment which involved the use of sine wave speech. This is speech which has been treated so that its essential features, the formants which allow us to distinguish sounds, are replicated using synthetic sine waves. The result is a musical R2-D2 warbling which on first hearing is hardly recognisable as speech.

The paradigm that Pradheep is using relies on this fact; that unless people are told they are listening to speech, they don’t hear any speech content in the sounds. Once they are told, however, a process of ‘tuning in’ takes place: a recognition of the speech as speech, which also transfers to novel sine wave sentences they have never heard.

You can try out a related process for yourself at this web page by Matt Davis at the Cambridge Cognition and Brain Sciences Unit. Listen to the sine wave speech first, then the unencoded message, and then return to the sine waves. When you hear them a second time, the speech ‘pops out’.

This reminded me, in a roundabout way, of Alvin Lucier’s I am Sitting in a Room. In this minimalist composition from 1969, Lucier set up a feedback loop, reciting a text – which is also a description of what he is about to do – in a room, recording the result, and then replaying the tape in the same room whilst recording this second performance. As the iterations continue, in Lucier’s words, “the resonant frequencies of the room reinforce themselves so that any semblance of my speech, with perhaps the exception of rhythm, is destroyed.”

What emerges instead, as Lucier puts it, are “the natural resonant frequencies of the room articulated by speech.” And yet, knowing this is speech, and with Lucier’s pacing and characteristic stammer resonating in our ears, the sense of it persists through its degradation, even as the sounds turn into deep bottle tones and high glass rubbings. Eventually though, sense vanishes, and you’re left just with a whistle and throb that sounds like water in the pipes, as Lucier’s formants disappear into the room’s. There’s an original recording of the piece here on Ubuweb.

Variable Speechiness

Sophie Scott – my main point of contact at the ICN – mentioned the work of her colleague Stuart Rosen, so I looked him up. This paper by him and a number of others (including Sophie) is freely available online, and is really interesting. In broad terms – and I hope I’ve got this right – the experiment they describe was designed to test whether the areas of the left hemisphere which are specialised for language processing are responding to specific acoustic features of speech, or rather to its lexical properties.

What appealed to me was their invention of a set of stimuli which varied in two respects: in its ‘speechiness’, that is to say its speech-like acoustic features, and in its intelligibility as language.

To make these stimuli they took recordings of short sentences like ‘The clown had a funny face’ and ‘The wife helped her husband’ and reduced them to variations on two dimensions: spectrum and amplitude. You can make sentences which are neither speech-like nor intelligible by keeping either spectrum or amplitude constant; and you can make sentences which are speech-like but still unintelligible by mixing the spectra of one phrase with the amplitudes of another.

The subjective reports of what these stimuli sounded like are very suggestive. When both spectrum and amplitude are kept constant, the sound is like ‘wind in the trees’ or ‘electronic vowel sounds’. When the amplitude changes but the spectrum is held constant, the stimuli are ‘rhythmic’, ‘like a nursery rhyme’. If spectral variation is combined with a constant amplitude, the descriptions varied from ‘like speech with the bits taken out’ to ‘like an alien’ or ‘a lunatic raving’. Finally, the most speech-like but non-intelligible combination of the spectra and amplitudes from two different phrases run together was described as ‘very much like speech’, like someone ‘with a regional accent’ or like ‘aliens again’.

It may not be the point of this experiment, but I’m really interested by this gradation of speechiness, from wind and electronics through nursery rhymes and pure rhythm, up through aliens and the madness to a regional accent – almost speech but no gold watch! It’s interesting that animal noises don’t get mentioned: but the authors do mention that they treated the stimuli to make them ‘sound less like bird calls’.

So what did the study find? The speech-like but unintelligible modulations were processed bilaterally, a finding at odds with the theory that the left temporal lobe is specialised to process all speech-related sounds. As the authors write, it seems that ‘crucial left temporal lobe systems involved in speech processing are not driven simply by low level acoustic features of the speech signal but require the presence of linguistic information to be activated.’

If you want to have a listen to some of the stimuli, a few WAV samples are here. This is the supplementary material for a subsequent paper by Carolyn McGettigan from the ICN; unfortunately the paper itself is only available to subscribers. The wav files aren’t labelled but if you look at their names you can tell the conditions:

intSmodAmod = intelligible speech where spectrum and amplitude modulate

SmodA0 = unintelligible speech; spectrum modulates but amplitude is constant

S0Amod = unintelligible speech; spectrum is constant but amplitude modulates

SmodAmod = unintelligible speech; spectrum and amplitude modulate but from different sentences