Hearing-impaired
listeners' auditory, visual, and audio-visual
word recognition speed: Predicting individual
differences
Philip F. Seitz and
Army Audiology and
[Poster paper presented at the
NIH-NIDCD/VA Second Biennial Hearing Aid Research and Development Conference,
OBJECTIVES
In older people with acquired sensorineural hearing loss, investigate factors
affecting recognition speed for auditory (A), visual (V), and audio-visual (AV)
spoken words. Address the following specific questions:
• Are there significant recognition speed differences among A, V, and AV words?
• Are individual differences in unimodal (A, V) and bimodal (AV) recognition speed predicted by audiological factors and age?
• Is there "AV benefit" with respect to recognition speed, and if so, what factors predict AV speed benefit for individuals?
BACKGROUND
The perceptual and cognitive processes involved in spoken word recognition
require time in which to operate. Abnormally slow word recognition might be a
factor in the speech understanding difficulties experienced by hearing-impaired
patients, and might limit the benefit they derive from hearing aids. Given
limitations of working memory capacity (Baddeley,
1986; Norman and Bobrow, 1975), the receiver of a
spoken sentence must process the signal fast enough to keep up with the
continuous flow of phonetic information, arriving at word percepts not much later
than when the next word starts. Otherwise, the processing operations and memory
requirements for earlier-heard input could interfere with those of later-heard
input. Thus, because sentence comprehension depends on the perceiver's ability
to retain earlier-heard information in memory while receiving the rest of the
utterance, abnormally slow word recognition could limit listeners' ability to
understand sentences. The difficulties with sentences produced at faster rates
experienced by older listeners (Working Group on Speech Understanding and
Aging, 1988; Gordon-Salant and Fitzgibbons, 1997) and
by listeners with hearing loss (Picheny et al.,
1985; Uchanski et al., 1996) might be
traceable in part to abnormally slow word recognition.
Visual speech cues typically afford large intelligibility improvements for people with hearing loss (Erber, 1975; Walden et al., 1974, 1975) and for people with normal hearing listening to degraded signals (Grant and Walden, 1996; MacLeod and Summerfied, 1987; Sumby and Pollack, 1954). Comprehension of spoken discourse also appears to require less attention and effort when presented audio-visually than auditorily (Reisberg et al., 1987). However, perceptual encoding speeds for A, V, and AV speech have not been compared previously. Based on observations of 1) faster choice reaction time to phonetically richer than to phonetically poorer speech signals (e.g., natural vs. synthetic speech [Pisoni and Greene, 1990]), and 2) faster choice reaction time to bimodal than to unimodal non-speech stimuli (e.g., Hughes et al., 1994), it would not be surprising to find faster perceptual encoding of AV than A spoken words. However, in a choice RT experiment using synthetic speech syllables, Massaro and Cohen (1983) found no speed "benefit" for AV versus A stimuli.
APPROACH
Task:
Sternberg "memory scanning": Subject is presented serially a
"memory set" of 1-4 spoken words, then is
presented a "probe" to which he/she makes a speeded YES/NO button
response to indicate whether the probe is a member of the memory set
(Sternberg, 1966, 1975). See Figure 1.
Design:
All within-subjects.
Dependent Variables:
• Perceptual Encoding Speed, represented by the intercept of the memory set
size-reaction time (RT) function, as fit to a linear model;
• AV Benefit, represented by the difference between RTA and RTAV.
Independent Variables:
• Modality (A, V, AV) of spoken word stimuli;
• Subject variables: age, low-PTA (.5, 1, 2 kHz), high-PTA (3, 4 kHz), and clinically-measured word recognition;
• Derived measure: Encoding speed difference between V and A (RTV - RTA), called "V-A Synchrony"
METHODS
Subjects:
N = 26; mean age = 66.2 y., std. dev. = 6.1;
mean 0.5, 1, 2 kHz average audiometric threshold = 37 dB HL, std. dev. = 11.6
(see Figure 2).
Stimuli:
Two tokens of each of 10 words from the CID W-22 list, spoken by one talker,
professionally video recorded on optical disc:
bread,
ear, felt, jump, live, owl, pie, star, three, wool
This set of words was selected so as to allow 100% accurate closed-set identification in A, V, and AV modalities.
Procedure:
• After passing a 20/30 vision screening test, subjects established their
most-comfortable listening levels (MCL) for monotic,
better-ear, linear-amplified presentation of the spoken words under an Etymotic ER-3A insert earphone. All subsequent auditory
testing was done at individual subjects' MCLs.
• In a screening test, subjects had to demonstrate perfect recognition of the 20 stimuli (10 words × 2 tokens) in A, V, and AV modalities, given advance knowledge of the words. Five subjects out of 33 recruited (15%) did not pass the V (lipreading) part of this screening.
• After practice blocks in each modality, subjects completed three 80-trial A, V, and AV test blocks in counterbalanced order over three 2-hour sessions, providing 240 responses per modality and 60 responses per memory set size within a modality.
• For each subject, 12 RT means were calculated (3 modalities × 4
memory set sizes). Incorrect responses were discarded, as were responses more
than 2.5 standard deviations greater than the mean for each of the 12 cells
(Ratcliff, 1993). Least-squares lines were fitted to the four data points
representing each subject's memory set size-RT function. Two subjects out of 28
were dropped because of excessive errors or excessively long RT in testing. See
Figure 3 for overall results.
RESULTS AND DISCUSSION
Modality-Related Differences in Perceptual Encoding Speed:
Figure 4 shows that there were significant perceptual encoding speed
differences due to modality. The intercept of the linear model of a subject's
memory set size-RT function is assumed to represent the sum of encoding speed
and motor execution speed (Sternberg, 1975). Motor execution speed is assumed
to be constant within a subject, so it is relative differences among conditions
within a subject that indicate factor effects. Wilcoxon
Signed Ranks Tests (2-tailed) were performed on the intercepts of each modality
pairing and all comparisons were significant.
Encoding Speed Correlations Among Modalities:
Figures 5-7 show that there are strong predictive relations among subjects'
encoding speeds for words received in the three modalities. (In these and
following figures, results of 2-tailed Pearson and Kendall's tau-b correlation tests are noted.) Although part of
the pattern of individual differences seen here is due to constant motor
execution speed differences, the larger part is likely due to individuals'
characteristic speeds for perceptually encoding speech signals. This result
lends new support, by way of an RT measure, to C. Watson's (1996) and others'
claims that there is a central, modality-independent source of individual
differences in speech recognition.
![]() |
![]() |
Audiological Factors and
Encoding Speed:
Table 1 shows that the only audiological factor
significantly correlated with encoding speed was age. Age had a moderate
predictive relation to V encoding speed, as Figure 8 shows. However, when V and
A cues are combined in AV perception, older perceivers no longer appear to be
slower. It is notable that the audiometric threshold measures did not predict
RT to auditory words, as might be expected based on earlier studies of
congenitally hearing-impaired listeners (Seitz and Rakerd,
1996).
|
Age |
|
|
|
|
|
|
|
Age |
1.0 |
LO- PTA |
|
|
|
|
|
|
LO- PTA |
-.31 |
1.0 |
HI-PTA |
|
|
|
|
|
|
.19 |
.41* |
1.0 |
WRec |
|
|
|
|
WRec |
-.12 |
-.37 |
-.40* |
1.0 |
RTA |
|
|
|
RTA |
.28 |
.11 |
.12 |
-.20 |
1.0 |
RTV |
|
|
RTV |
.54** |
-.14 |
-.02 |
.07 |
.72** |
1.0 |
RTAV |
|
RTAV |
.33 |
-.03 |
.06 |
-.06 |
.92** |
.86** |
1.0 |
|RTV -RTA| |
|RTV -RTA| |
.47** |
-.32 |
-.16 |
.32 |
-.10 |
.63** |
.19 |
1.0 |
RTA -RTAV |
.09 |
.35 |
.15 |
-.37 |
.36 |
-.22 |
-.04 |
-.69** |
Audio-Visual Encoding Speed Benefit and Its Predictor, "V-A
Synchrony":
It is clear from Figures 3 and 7 that there is an AV benefit with respect to
perceptual encoding speed for spoken words. A difficult and interesting
challenge is to explain individual differences in AV benefit. The present data
point to a construct that can be called V-A synchrony as a predictor of
AV benefit. V-A synchrony is simply the absolute difference between an
individual's V and A encoding speeds. Figure 9 shows
that the V-A synchrony construct accounts for a rather large portion of the
variability in AV benefit in the present data (r2 = 0.49).
Apparently, in order to obtain AV encoding speed benefit, it is helpful for an
individual's A and V unimodal encoding speeds to be
similar. Based on analysis of data from 15 of the present study's subjects who
also participated in a study of AV benefit in consonant identification (Grant et
al., 1998), it might be the case that V-A encoding speed synchrony is an
effective predictor of individual differences in AV benefit in speech
identification tasks.
CONCLUSIONS
• There were significant perceptual encoding speed differences related to
modality, suggesting potential modality-related advantages and difficulties in
recognition of fast speech and connected speech.
• There were strong inter-modal encoding speed correlations, lending support from a reaction time perspective to the idea of a central, modality-independent source of individual differences in speech perception.
• Older subjects tended to be slow at V encoding, but this did not affect their AV encoding speed. There were no other significant correlations among audiological and RT measures.
• There was a highly significant AV benefit with respect to perceptual encoding speed.
• The construct V-A synchrony predicted individual differences in degree of AV encoding speed benefit.
REFERENCES
Baddeley, A.D. (1986). Working Memory (
Erber, N.P. (1975). "Auditory-visual perception of speech," J. Speech Hearing Res. 40, 481-492.
Gordon-Salant, S., and Fitzgibbons, P.J. (1997). "Selected cognitive factors and speech recognition performance among young and elderly listeners," J. Speech Hear. Res. 40, 423-431.
Grant, K.W., and Walden, B.E. (1996). "Evaluating the articulation index for auditory-visual consonant recognition," J. Acoust. Soc. Am. 100, 2415-2424.
Grant, K.W., Walden, B.E., and Seitz, P.F. (1998)."Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration," J. Acoust. Soc. Am. 103, 2677-2690.
Hughes, H.C., Reuter-Lorenz, P.A., Nozawa, G., and Fendrich, R. (1994). "Visual-auditory interactions in sensorimotor processing: Saccades versus manual responses," J. Exp. Psychol.: Human Percept. Perform. 20, 131-153.
Luce, R.D. (1986). Response Times: Their Role in Inferring
Elementary Mental Organization (
Massaro, D.W., and Cohen, M.M. (1983). "Categorical or continuous speech perception: A new test," Speech Commun. 2, 15-35.
MacLeod, A., and Summerfield, Q. (1987). "Quantifying the contribution of vision to speech perception in noise," British J. Audiol. 21, 131-141.
Norman, D.A., and Bobrow, D.G. (1975)."On data-limited and resource-limited processes," Cognitive Psychol. 7, 44-64.
Picheny, M.A., Durlach, N.I., and Braida, L.D. (1985). "Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech," J. Speech Hear. Res. 28, 96-103.
Pisoni,
D.B., and Greene, B.G. (1990). "The Role of Cognitive
Factors in the Perception of Synthetic Speech," In Research on Speech
Perception Progress Report No. 16 (pp. 193-214).
Ratcliff, R. (1993). "Methods for dealing with reaction time outliers," Psychol. Bull. 114, 510-532.
Reisberg, D.,
Seitz, P.F., and Rakerd, B. (1996)."Hearing impairment and same-different reaction time," J. Acoust. Soc. Am. 99, 2602.
Sternberg, S. (1966). "High-speed scanning in human memory," Science 153, 652-654.
Sternberg, S. (1975). "Memory scanning: New findings and current controversies," Q. J. Exp. Psychol.27, 1-32.
Sumby,
W.H., and Pollack,
Uchanski, R.M., Choi, S.S., Braida, L.D., Reed, C.M., and Durlach, N.I. (1996)."Speaking clearly for the hard of hearing IV: Further studies on the role of speaking rate," J. Speech Hear. Res. 39, 494-509.
Working Group on Speech Understanding and Aging (Committee
on Hearing, Bioacoustics and Biomechanics, National Research Council,
Walden, B.E., Prosek, R.A., and
Walden, B.E., Prosek, R.A., and
Watson, C.S., Qiu, W.W., Chamberlain, M.M., and Li, X. (1996). "Auditory and visual speech perception: Confirmation of a modality-independent source of individual differences in speech recognition," J. Acoust. Soc. Am. 100, 1153-1162.
ACKNOWLEDGMENT
Supported by research grant numbers R29 DC 01643 and R29 DC 00792 from the
National Institute on Deafness and Other Communication Disorders, National
Institutes of Health. This research was carried out as part of an
approved
CONTACT INFORMATION
Army Audiology and Speech Center
Walter Reed Army Medical Center
Washington, DC 20307-5001
(202) 782-8596
grant@tidalwave.net