Do we parse the background into separate streams in the cocktail party?

Frontiers in Human Neuroscience, Section: Speech and Language, 28 October 2022

DOI: 10.3389/fnhum.2022.952557

Orsolya Szalárdy1,2, Brigitta Tóth2, Dávid Farkas2, Gábor Orosz2, István Winkler3

1Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary

2Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary

3Unité de Recherche Pluridisciplinaire Sport Santé Société, Université d’Artois, Université de Lille, Université du Littoral Côte d’Opale, Liévin, France


In the cocktail party situation, people with normal hearing usually follow a single speaker among multiple concurrent ones. However, there is no agreement in the literature as to whether the background is segregated into multiple streams/speakers. The current study varied the number of concurrent speech streams and investigated target detection and memory for the contents of a target stream as well as the processing of distractors. A male-voiced target stream was either presented alone (single-speech), together with one male-voiced distractor (one-distractor), or a male- and a female-voiced distractor (two-distractor). Behavioral measures of target detection and content tracking performance as well as target- and distractor detection related event-related brain potentials (ERPs) were assessed. We found that the N2 amplitude decreased whereas the P3 amplitude increased from the single-speech to the concurrent speech streams conditions. Importantly, the behavioral effect of distractors differed between the conditions with one vs. two distractor speech streams and the non-zero voltages in the N2 time window for distractor numerals and in the P3 time window for syntactic violations appearing in the non-target speech stream significantly differed between the one- and two-distractor conditions for the same (male) speaker. These results support the notion that the two background speech streams are segregated, as they show that distractors and syntactic violations appearing in the non-target streams are processed even when two speech non-target speech streams are delivered together with the target stream.

Keywords: speech processing, background segregation, N2, P3, target detection