Computational Approaches to
Relating Consonant and Sentence Recognition Test Scores


Philip F. Seitz and Ken W. Grant
Army Audiology and Speech Center, Walter Reed Army Medical Center

[Poster paper presented at the 134th Meeting of the Acoustical Society of America, San Diego, CA, December 3, 1997: Seitz, P.F., and Grant, K.W. (1997). "Computational approaches to relating consonant and sentence recognition test scores," J. Acoust. Soc. Am. 102, 3137.]

Investigate the predictive relation between recognition scores for consonants in VCV nonsense syllables and words in sentences. Specifically, test whether explicit modeling of differences between the VCV and sentence materials strengthens the predictive relation.

Two computational experiments were run on data from 29 hearing-impaired listeners who performed auditory-visual (AV) speech recognition tests:

Experiment 1:
Weight nonsense syllable recognition scores for individual consonants so as to reflect their relative frequencies of occurrence in the sentence materials. If the predictive relationship between the scores on the different materials is due to subjects' ability to recognize segments, this weighting should increase the strength of the correlation between consonant and words-in-sentences scores.

Experiment 2:
Two principles were applied in this experiment:

• For predicting sentence recognition from consonant recognition, use all the information in subjects' consonant confusion matrices; that is, not only correct responses (on-diagonal in a matrix), but also the pattern of confusions (off-diagonal cell probabilities).

• Because word recognition scores predict sentence scores better than consonant scores do, use consonant confusion matrices first to predict word scores, then use predicted word scores to predict sentence scores, using Boothroyd and Nittrouer's (1988) mathematical treatment. Use a 35,000-word phonological lexicon to estimate subjects' word scores based on their consonant confusion patterns and the loss of lexical contrast implied by the patterns.

Several recent studies have found predictive relations among scores on phoneme, word, and sentence recognition tests (Boothroyd and Nittrouer, 1988; Demorest et al., 1997; Olsen et al., 1997), corroborating and extending Harvey Fletcher's original work (Allen, 1994; Fletcher and Steinberg, 1929). On one view, these relations exist because the same consonant and vowel segments (phonemes) occur in all of the materials. If this is the case, it is reasonable to expect predictive relations among test materials to be strongest when there is a good correspondence between the phonemes that occur, and their frequencies of occurrence, in the different materials. The present study began by putting this simple notion to the test.

Properties of the lexicon mediate the relationship between phoneme and word recognition. Because of lexical redundancy, listeners need to recognize an average of about 2.5 phonemes in order to recognize a CVC word. Individual differences in patterns of phoneme confusions might affect listeners' ability to take advantage of lexical redundancy. The present study introduced a method for deriving subject-specific lexical redundancy factors (Boothroyd and Nittrouer's j factor) in order to obtain more accurate predictions of individuals' sentence recognition performance based on phoneme recognition scores from nonsense syllable tests.

The data for this study were obtained from 29 hearing-impaired listeners who performed two auditory-visual speech recognition tests (see Grant et al., 1998):

• C nonsense syllables, where C was one of the following 18 English consonants:   /b p g k d t m n v f ð z s /.

• Key words in IEEE/Harvard sentences.

A set of simplifying assumptions was operative throughout this study:

• All vowels and non-tested consonants (/l r w y h/) were recognized with 100% accuracy.

• Subjects were equivalent with respect to their ability to use morpho-syntactic and semantic context in sentence recognition (Boothroyd and Nittrouer's k factor) as well as their working memory capacity and speed of processing.

In the VCV test, each of the 18 consonants was presented 40 times, so each consonant's frequency of occurrence was 5.5%. Frequencies of occurrence of the 18 consonants were tabulated for the 250 IEEE/Harvard key words presented in the sentence test, and these were compared to the IEEE/Harvard materials as a whole and to the Brown Corpus to determine their typicality (Figure 1).




Subjects' scores were recomputed as the sum of the products of each consonant's N correct and percentage occurrence in the 250 tested key words. These scores were scaled to a reasonable range for comparison with the subjects' original overall percent correct scores (Figure 2).




Rather than strengthening the predictive relation between VCV and sentence scores, the occurrence-weighting approach actually weakened the relation, as measured by Pearson's r and Kendall's -b (Figure 3).




One possible reason for this result is that the weighting reduced some consonants' contributions to the overall score to negligible levels, effectively making the scores represent subjects' performance with a smaller set of consonants (Figure 4).




This result suggests that for predicting subjects' performance on sentences, it is more important for the VCV materials to measure phonetic cue extraction ability (best measured by performance with a larger set of phonemes) than to measure recognition of specific segments.

Computation of Scores, Step 1:
For each non-zero off-diagonal (error) cell in subjects' consonant confusion matrices, lexical confusions between test words and other (real) words potentially resulting from the consonant confusion were determined using a 35,000-word phonological lexicon (Seitz et al., 1995) . Lexical confusions were tested by substituting the phonemes of the test word with each phoneme for which a confusion was observed in the VCV recognition test; possible deletion and insertion errors were not tested. Each off-diagonal cell then had a "penalty" value representing the number of lexical confusions times the cell's probability. Figure 5 shows schematically the logic and operation of the lexical confusion tests using a hypothetical example. Figure 6 shows a portion of the output generated for one subject.


graph of stimulus v. response




The sum of penalties for all test words was used as a subject's lexical redundancy "score." As another kind of lexical redundancy score, each lexical confusion was weighted by the homophonous lexical item's Brown Corpus word frequency count (log10 transformed). Because of differences in patterns of consonant confusions, subjects' lexical redundancy scores produce somewhat different subject rankings than their main diagonal scores do, as seen in Figure 7.



Computation of Scores, Step 2:
Subjects' lexical redundancy scores were scaled to the range appropriate for Boothroyd and Nittrouer's j factor, which expresses the relationship between recognition of perceptual parts (segments) and wholes (words),

 j = log(pw) / log(pc)                     (1)

where pw is the probability of recognizing a word and pc is the probability of recognizing a consonant, and

pw = pcj                                         (2)

when j is known. Also, 1 j n, where n is the number of segments in a word. For the IEEE/Harvard key words, 2 n 9, with mean n = 3.84; therefore, the lexical redundancy scores were scaled such that

1 <= j <= 3.84.                             (3)

Subjects' lexical redundancy scores now act as j, and their overall (on-diagonal) consonant score acts as c, allowing us to obtain subjects' predicted word scores from (2). Having obtained subjects' predicted word scores, it is possible to obtain their predicted words-in-sentence scores (pS),

 pS = 1 - (1 - pcj)k                         (4)

 where k is a factor representing morpho-syntactic and semantic context effects. Based on current experimental results with AV recognition of IEEE/Harvard sentences (Grant and Seitz, 1997), k was fixed in (4) at 1.77.

The two approaches using subjects' lexical redundancy scores as j were compared to an approach using a fixed j of 2.46 (Boothroyd and Nittrouer, 1988) with respect to strength of correlation between predicted and actual scores. All of these three approaches were compared to the original prediction of sentence scores using overall percent correct VCVs, as shown in the table below and in Figures 1 and 8.

Predictor                         Figure Ref.          r         tau

VCV Pct Correct             Fig. 1                 .737     .446

Fixed j, pS                        Fig. 8                 .761     .530

Var. j (no WFW) pS        (none)                 .784     .523

Var. j (with WFW) pS     Fig. 8                  .806     .528




Tests for differences between dependent correlations (Bruning and Kintz, 1968) indicated that there were no significant differences among the four predictors (i.e., VCV percent correct and the three lexical confusion approaches).

• Weighting VCV scores by frequency of occurrence of the consonants in sentence materials does not strengthen the correlation between VCV and sentence scores; in fact, it weakens it, apparently because the weighting has the effect of reducing the amount of VCV confusion matrix information that can be used for predicting sentence scores.

• It is feasible to use off-diagonal phoneme confusion matrix cell probabilities to obtain subject-specific lexical confusion scores (j-factors) for predicting word scores. This approach increases the amount of VCV confusion matrix information used for predicting sentence scores. Compared to using VCV percent correct score (on-diagonal confusion matrix cells) for predicting sentence recognition performance, the lexical confusion computational approach offers a small but non-significant increase in the strength of the linear relationship between VCV and sentence test scores.

• The lexical confusion technique for needs to be validated. It also needs to be tested on data sets with lower overall scores.

Allen, J.B. (1994). "How do humans process and recognize speech?," IEEE Trans. Speech Aud. Proc. 2, 567-577.

Bilger, R.C. (1984). "Speech recognition test development," in Speech Recognition by the Hearing Impaired (ASHA Reports 14), pp. 2-7.

Boothroyd, A., and Nittrouer, S. (1988). "Mathematical treatment of context effects in phoneme and word recognition," J. Acoust. Soc. Am. 84, 101-114.

Bruning, J.L., and Kintz, B.L. (1968). Computational Handbook of Statistics (Scott, Foresman and Company, Glenview, IL).

Fletcher, H., and Steinberg, J. (1929). "Articulation testing methods," Bell System Tech. J. 8, 806-854.

Grant, K.W., Walden, B.E., and Seitz, P.F. (1998). "Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration," J. Acoust. Soc. Am. 103, 2677-2690.

Grant, K.W., and Seitz, P.F. (1997). "The recognition of isolated words and words in sentences: Individual variability in the use of sentence context," J. Acoust. Soc. Am. 102, 3132.
Grant, K.W., Walden, B.E., and Seitz, P.F. (1998). "Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration," J. Acoust. Soc. Am. 103, 2677-2690.

Nittrouer, S., and Boothroyd, A. (1990). "Context effects in phoneme and word recognition of young children and older adults," J. Acoust. Soc. Am. 87, 2705-2715.

Olsen, W.O., Van Tassell, D.J., and Speaks, C.E. (1997). "Phoneme and word recognition for words in isolation and in sentences," Ear Hear. 18, 175-188.

Seitz, P.F., Bernstein, L.E., and Auer, E.T., Jr. (1995). PhLex (Phonologically Transformable Lexicon), a 35,000-word pronouncing American English lexicon on structural principles, with accompanying phonological rules and word frequencies (Gallaudet Research Institute, Washington, DC).

Supported by research grant numbers R29 DC 01643 and R29 DC 00792 from the National Institute on Deafness and Other Communication Disorders, National Institutes of Health. This research was carried out as part of an approved Walter Reed Army Medical Center, Department of Clinical Investigation research study protocol, Work Unit No. 2548 entitled "Cognitive Consequences of Hearing Differences." The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or reflecting the views of the Department of the Army or the Department of Defense.

Philip F. Seitz, Ph.D.
Army Audiology and Speech Center
Walter Reed Army Medical Center
Washington, DC 20307-5001
(202) 782-8579