Mission Statement |
Auditory-Visual Speech Recognition Laboratory (301) 919-2957
Data Sets Below, are links to data sets containing consonant confusion matrices from normal-hearing and hearing-impaired subjects. The tests typically contain either 18, 16, or 14 consonants presented in a vCv context (vowel /a/). Data Set 1: Individual results from 40 hearing-impaired subjects, multiple productions of 18 consonants, presented at 0 db S/N (continuous speech-shaped noise). Each consonant was presented 40 times in each receiving condition (720 responses per matrix). There are three matrices per subject, corresponding to auditory, visual, and auditory-visual recognition conditions. The token labels for the 18 consonants are: b,p,g,k,d,t,m,n,v,f,tx (as in the word "that"),th,z,s,zh (as in the word "beige"),sh,ch, and j. Two papers describing all or part of this data set are: 1) Grant, K.W., Walden, B.E., and Seitz, P.F. (1998). "Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration," J. Acoust. Soc. Am., 103, 2677-2690. 2) Grant, K.W., and P.F. Seitz (1998). "Measures of auditory-visual integration in nonsense syllables and sentences," J. Acoust. Soc. Am., 104, 2438-2450. Data Set 2: Pooled results from 8 normal-hearing subjects, multiple productions of 18 consonants, presented at a variety of S/N ratios (continuous speech-shaped noise). Each consonant was presented 40 times per subject (5760 responses per matrix). There are two matrices per condition (auditory and auditory-visual). At the end of the file is a pooled visual only matrix. The token labels for the 18 consonants are: b,p,g,k,d,t,m,n,v,f,tx (as in the word "that"),th,z,s,zh (as in the word "beige"),sh,ch, and j. A paper describing these data, as well as consonant recognition of filtered speech, can be found in: Grant, K.W., and Walden, B.E. (1996). "Evaluating the articulation index for auditory-visual consonant recognition," J. Acoust. Soc. Am.100, 2415-2424. Data Set 3: Pooled results from 4 normal-hearing subjects, multiple productions of 18 consonants, presented at a variety of filtered speech conditions. Each consonant was presented 40 times per subject in the auditory-alone conditions (2880 responses per matrix). Several of the auditory-visual conditions have fewer presentations per consonant because subjects were at 100% correct recognition for 5 consecutive test blocks. This occured most often when the audio contained significant low-frequency energy (filter conditions 1, 2, 7, and 10) demonstrating the synergy between low-frequency audio and speechreading. There are two matrices per condition (auditory and auditory-visual). At the end of the file is a pooled visual only matrix. The token labels for the 18 consonants are: b,p,g,k,d,t,m,n,v,f,tx (as in the word "that"),th,z,s,zh (as in the word "beige"),sh,ch, and j. The filter conditions are listed at the top of the file. A paper describing these data, as well as consonant recognition of filtered speech, can be found in: Grant, K.W., and Walden, B.E. (1996). "Evaluating the articulation index for auditory-visual consonant recognition," J. Acoust. Soc. Am.100, 2415-2424.vCv Stimulus set. Download the vCv stimulus set. The set contains 18 consonants in /a/-C-/a/ context. Each consonant was spoken 8 times for a total of 144 tokens. The speaker was a female. The files were digitized at a sampling rate of 20 kHz and 16-bit dynamic range. These are mono, headerless files.A sample Matlab script to read the files is listed below:
SampFreq = 20000; % sampling frequency
stimpath = 'c:\vcv\'; % stimulus path
filename = 'aba.t01'; % file name
infile = [stimpath filename];
fid = fopen(infile,'r');
[samples, count] = fread(fid,'short'); % samples contains data array for plotting or playing
fclose(fid)
|
||||||