Simon Fraser University Speech Error Databasesfused320x240_2015-11-13

Objectives

  • To create two large databases, one for English speech errors, another for Cantonese speech errors.
  • To extend and refine existing methodologies for speech error collection and classification.
  • To develop a classification system that supports rigorous testing of psycholinguistic effects and investigation of detailed linguistic structures
  • To support examination of phonetic errors, distinct from phonological errors, through classification and audio recordings.
  • To partner with language production researchers on research projects that enrich the database, and ultimately make the database a public resource (projected release date is summer of 2019).

The interface

Click on picture to see specific fields and illustration of a sound addition error fanfran.

SFUSED screenshot

The Example Fields at the top of the screenshot document the speech error: the word with the “/” prefix in the example box is the error word and the intended word is given below in phonetic and orthographic representations, along with a variety of other facts about the error (is the word clipped?, is it corrected by the speaker?, etc.) The opposition between intended and error words is continued below in the Word Fields, which document part of speech, open/closed nature of the words, and things like the semantic and morphological relationship between intended and error words. Sound errors are further documented with facts of the supplanted intended, intruder sounds, and source sounds: syllabic role, position in a word, its larger syllable, etc. In this example, the supplanted intended is “∅” because the error is a sound addition (similar to the rule format for insertions in classic generative phonology). These values make it easy to generate confusion matrices and investigate a variety of linguistic facts, like if the intruder sound comes from the same syllabic position as the source sound. The broader classification, i.e., sound error vs. word error, contextual vs. noncontextual, is given in the Major Class Fields on the left, and a variety of special class fields further describe these major error types. The Record Fields on the right give detailed information about the recording and the speakers, and the Markedness Measures below them track syllable level markedness effects; for example, the error ‘fran’ for ‘fan’ produces a marked onset cluster. With over 80 distinct fields, SFUSED supports detailed searches of both linguistic and processing structures.

Publications

Data collections:

  • Alderete, J. 2018. Simon Fraser University Speech Error Database – English (SFUSED English Beta). Burnaby, British Columbia, Canada. [Database of over 10,000 speech errors in English with speech samples]
  • Alderete, J. & Q. Chan. 2018. Simon Fraser University Speech Error Database – Cantonese (SFUSED Cantonese 1.0). Burnaby, British Columbia, Canada. [Database of over 2,500 speech errors in Cantonese with speech samples]

Journal articles:

  • Alderete, J. & M. Davies. 2018. Investigating perceptual biases, data reliability, and data discovery in a methodology for collecting speech errors from audio recordings. Language and Speech DOI: 10.1177/0023830918765012.
  • Alderete, J. & P. Tupper. 2018. Phonological regularity, perceptual biases, and the role of grammar in speech error analysis. WIREs Cognitive Science 9.e1466.

My 2018 article with Monica Davies describes is some detail the methods we have used in collecting speech errors, and documents a number of ways in which the results differ from other speech error collections that use on-the-spot observational techniques. We also describe a number of benefits of our methodology for data discovery, including estimates of the frequency of speech errors and drilling down into the phonetic structure of speech errors. In Alderete & Tupper (2018), we examine the role of phonotactics in shaping speech errors and show that, largely as a result of our new methodology, the rate of errors that violate phonotactic constraints is much higher than assumed in past research. The article also reviews the impact of phonotactics in a variety of different frameworks and then considers the model implications of these new facts.

Research team

Principal Investigator:

John Alderete (Simon Fraser University)

Collaborators:

Stefan Frisch (University of South Florida)

Alexei Kochetov (University of Toronto)

Paul Tupper (Simon Fraser University)

Henny Yeung (Simon Fraser University)

Research assistants:

Rebecca Cho

Monica Davies

Gloria Fan

Holly Wilbee

Jennifer Williams

Dave Warkentin

Olivia Nickel

Crystal Ng

Queenie Chan

Macarius Chan

Laura Dand

Amanda Klassen

Heikal Badrulhisham

Jane Li

Gavin Tam