Investigating perceptual biases, data reliability, and data discovery in a methodology of collecting speech errors from audio recordings

John Alderete, Monica Davies

Abstract:

This work describes a methodology of collecting speech errors from audio recordings and investigates how some of its assumptions affect data quality and composition. Speech errors of all types (sound, lexical, syntactic, etc.) were collected by eight data collectors from audio recordings of unscripted English speech. Analysis of these errors showed that (i) different listeners find different errors in the same audio recordings, but (ii) the frequencies of error patterns are similar across listeners; (iii) errors collected “online” using on the spot observational techniques are more likely to be affected by perceptual biases than “offline” errors collected from audio recordings, and (iv) datasets built from audio recordings can be explored and extended in a number of ways that traditional corpus studies cannot.

Keywords: speech errors, language production, methodology, data reliability, data quality, phonetic errors, perceptual biases

Downloads:  link to the PDF

Full citation: Alderete, John and Monica Davies. 2016. Investigating perceptual biases, data reliability, and data discovery in a methodology for collecting speech errors from audio recordings. Manuscript, Simon Fraser University.