research

July 24, 2024 - 7 minutes

Individual differences in musical melody perception moderate the speech-to-song illusion in Mandarin Chinese listeners

Tamara Rathcke & Massimiliano Canzi (2026). Scientific Reports

Repeated exposure to a spoken phrase can give rise to the perception of the speech-to-song illusion (STS), whereby speech gains musical qualities and begins to sound like singing. STS is known to rely on acoustic cues and may depend on an individual’s ability to extract musical qualities (such as melody and rhythm) from speech acoustics.

So far, most research has examined listeners of non-tonal languages, with preliminary evidence indicating that tonal-language listeners experience STS differently, if at all. This study investigated STS in Mandarin Chinese listeners who rated song-likeness of Mandarin sentences before and after repetition and completed the Musical Ear Test. Test sentences were designed to promote the acoustic transmission of either melody or rhythm. Results demonstrated a modest STS effect in Mandarin listeners at the group level, which was independent of sentence acoustics. Individual abilities in rhythm perception had no impact on STS while, somewhat surprisingly, weaker melody perception abilities were found to facilitate STS.

This suggests that STS in Mandarin Chinese may be linked to a perceptual distortion of pitch. Overall, the findings indicate that STS mechanisms are shaped by linguistic background of listeners and provide new evidence that language experience can influence music perception and cognition.

~

L2 learners take more time to catch the rhyme: An eye-tracking study on predictive processing

Marta Tagliani, Lucas Cruz, Michela Redolfi, Natalya Shirokorad, Massimiliano Canzi, Chiara Melloni & Maria Vender (2025) International Journal of Bilingualism

Aims and objectives: This study investigates the role of prediction in language comprehension for both native (L1) and non-native (L2) speakers of English, focusing on phonological and semantic cues. In addition, it examines whether higher proficiency in the L2 enhances predictive abilities. Methodology: Using the Visual World Paradigm, we explore how Italian learners of English (L2) employ semantic and phonological cues during sentence parsing and compare these findings to native English speakers. Participants viewed images while hearing sentences in English, allowing us to analyze their eye movements and cue responses in real time. Data and analysis: Eye-tracking data from 61 Italian participants at B1, B2, and C1 English proficiency levels, as well as 23 native English speakers, were collected. Visual and auditory stimuli prompted participants to focus on specific items, and their eye movements were recorded. We applied generalized additive mixed models (GAMMs) to assess how phonological and semantic cues and varying English proficiency levels influenced the proportion of looks at the target before and after sentence offset. Findings: Results show that across all proficiency levels, participants fixated on the target faster when semantic cues were present in the lexical verb. However, B1 speakers showed a delayed response compared to more advanced groups. Native English speakers also demonstrated a heightened phonological effect in rhyme conditions compared to the L2 learners. Originality: This study is the first to examine whether the presence of multiple cues can enhance L2 predictive processing in relation to the learner’s language proficiency. Significance: The findings advance research on predictive processing in language comprehension, highlighting the importance of understanding the differences between L1 and L2 processing and the role of cue integration in facilitating prediction. Understanding these differences and the role of cue integration in facilitating prediction is crucial for improving language learning outcomes.

~

Synchronised movement and individual rhythmic skill influence the perception of temporal structure in spoken language

Tamara Rathcke, Eline Smit, Rachel Yue & Massimiliano Canzi (2024)
Attention, Perception & Psychophysics, 1-17 | doi: 10.3758/s13414-024-02893-8

The subjective experience of time flow in speech deviates from the sound acoustics in substantial ways. The present study focuses on the perceptual tendency to regularize time intervals found in speech but not in other types of sounds with a similar temporal structure. We investigate to what extent individual beat perception ability is responsible for perceptual regularization and if the effect can be eliminated through the involvement of body movement during listening. Participants performed a musical beat perception task and compared spoken sentences to their drumbeat-based versions either after passive listening or after listening and moving along with the beat of the sentences. The results show that the interval regularization prevails in listeners with a low beat perception ability performing a passive listening task and is eliminated in an active listening task involving body movement. Body movement also helped to promote a veridical percept of temporal structure in speech at the group level. We suggest that body movement engages an internal timekeeping mechanism, promoting the fidelity of auditory encoding even in sounds of high temporal complexity and irregularity such as natural speech.

~

Understanding the role of broadcast media in sound change

Tamara Rathcke, Chiara Castellano & Massimiliano Canzi (2024)
In F. Kleber & T. Rathcke (eds.), Speech Dynamics: Synchronic Variation and Diachronic change. Mouton De Gruyter

The idea that broadcast media can be a factor in sound change has been widely and controversially debated. This chapter outlines the main posits, issues and evidence surrounding the ongoing debates and offers a new empirical perspective on the subject matter. It hypothesizes that mass media may be the primary factor initiating and promoting sound change if there are limited opportunities for face-to-face contact, with media being the only or main source of exposure to sound innovation and dialectal variability. Such situations occur frequently during second language acquisition, which is the focus of the study discussed in the chapter. Eighteen German teenage learners of English were divided into two groups and asked to watch either a British or an American television series daily for the duration of two consecutive weeks. Comparisons of sound productions recorded before and after the two-week exposure period revealed significant changes in the participants’ frequency of /t/-flapping and rhoticity, in the direction predicted by the media accommodation account. In line with previous discussions, the observed influence of the media was partly moderated by the participants’ emotional involvement with the series they watched. A change toward the televised variety was observed primarily in high-engagement (but not in low-engagement) speakers. The chapter concludes with a discussion that aims to inspire innovative directions in future research of this much-debated topic that currently lacks pertinent empirical study.

~

Unmasking the truth: Impact of community masks on the perception of voiceless fricatives in English

Massimiliano Canzi & Tamara Rathcke (2023)
Proc. ICPhS 2023 | Download PDF

Abstract The current study aims at quantifying the effects of wearing a face mask on speech perception, by investigating performance of native English listen- ers in a phoneme monitoring task with monosyl- labic words containing voiceless fricatives. Previ- ous experimental work on the topic has mainly fo- cussed on the effects of acoustic filtering caused by the use of face coverings with mixed results and weak effects of mask wearing on speech percep- tion. In this experiment, we explore the interplay of acoustic filtering with other potentially relevant factors such as the presence of visual cues, lexical frequency and listener-specific background. We pro- vide evidence that suggests the impact of face cover- ings (esp. FFP-2 face mask) on speech perception is not directly moderated by the acoustic properties of masked speech. Rather, it is inked to an interplay of audio-visual integration, the absence of visual cues for (some) target fricatives, and the listener-specific sociolinguistic background.

Related: (2023) A. Tsaroucha, T. Rathcke, & Canzi, M. Effects of a face mask on the perception of English fricatives by native speakers of Greek. Proc. ICGL15

~

Same or different? Subject realization in the majority and heritage language of Polish-German bilingual children

Bernhard Brehmer, Aldona Sapata & Massimiliano Canzi (2023)
Linguistics Vanguard | doi 10.1515/lingvan-2022-0061

Abstract: The paper examines the extent to which bilingual children select lexical noun phrases and null and overt pronouns as referring expressions in their majority language German and their heritage language Polish. Both languages are similar regarding the availability of lexical noun phrases but differ in terms of the distribution of null and overt pronominal forms. Our focus lies on discourse contexts with a subject antecedent in the preceding clause, which require only light processing for both speaker and hearer due to the high accessibility of the intended subject referent. Drawing on experimental data from a picture story retelling task (MAIN) to investigate the distribution of referring expressions in the two languages compared to age-matched monolingual control groups, our results reveal that bilingual children are sensitive to crosslinguistic differences in the syntactic and discourse-pragmatic constraints that regulate the distribution of null and overt subjects in Polish and German, depending on the mode of speech (narrative or dialogic). Furthermore, there are no significant differences between the bilingual and monolingual children, irrespective of language and age group. Thus, our study cannot confirm findings of previous studies concerning the tendency of bilingual children to be either overspecific or underspecific in subject reference production.