UW X-ray Microbeam Speech Production Database

Data organization

Each subject subdirectory contains files for the speech tasks performed by that subject. Examples of file names include

TA001.ACC TP001.ACC TP001.TCC TP001.XYD TP020_2.ACC TP020_2.TCC TP020_2.XYD

The prefix TA denotes an audio-only task while the prefix TP denotes tasks where both audio and x-ray pellet tracking data were recorded. The numeric code is the task number referenced in Appendix A of the X-ray Microbeam Speech Production Database Handbook. For example, TP011 denotes x-ray and audio data for the initial portion of the ``Grandfather Passage'', and this code is the same for all subjects.

The underscore number (e.g. _2) denotes a repeated trial of a particular task. Trials get repeated for any number of reasons: the subject did not adequately perform the task in the initial trial, the data acquisition system failed to acquire the task in its entirety. In some cases, a particular task has no trials, is missing altogether. Such ``holes'' in the data result from failure of the x-ray tracking system to acquire the pellets for that task. On account of x-ray exposure limitations, many of such tasks are not repeated. The .ACC extension denotes the speech audio recording, .TCC denotes the throat accelerometer, and .XYD denotes the x-ray pellet coordinates.

Each subject subdirectory also contains the files

PAL.DAT PHA.DAT

containing the palate and pharynx traces for that subject. These traces are displayed by the XY software tool included on this disk.

Data formats

The physiology channels (.ACC and .TCC) have ASCII headers with the waveform samples in a compressed binary format. You can view the contents of the header (number of samples, interval between samples) with the TYPE command:

type d:\ubdb\jw11\tp011.acc

The XY and PLAY software programs included on this disk can access the compressed format physio channels directly off the CD-ROM without any intermediate decompression step. You may, however, want to play or analyze these data using other software tools. This disk includes conversion utilities for the RIFF (Microsoft .WAV) and SPHERE (TIMIT database) formats. Most Windows sound software recognizes RIFF. The Entropics Waves+ software package recognizes SPHERE files as do other packages capable of working with the TIMIT speech production database. Pascal language source code for reading the compressed files is included on this disk in case you want to read the compressed files with your own software tools.

The data compression is based on first difference followed by 12 coefficient LPC prediction to reduce the signal dynamic range. The LPC-derived log area ratio parameter together with the predicted speech waveform are encoded with Rice's algorithm, a type of variable-length coding. Because we employ a variable length code, the compression/expansion is exact, giving the numeric values of the original waveform samples without any type of approximation or distortion. This was an important criterion in the decision to apply data compression to audio data. The compressed format occupies approximately 40 percent of the original file space, and is key to making the Database available on a manageable number of CD-ROM platters. Each CD-ROM holds 5 hours of audio, accelerometer, and pellet tracking data. The .XYD file contains pellet data in ASCII format -- it is readable with any text editor. The data are sampled every 6.866 ms (approximately 146 Hz). We chose this sampling rate to facilitate the synchonization of pellet motion with the audio playback in the XY software tool included on this disk. The first sample is at time 0, the start of physiological data recording. Each line of the .XYD file is a separate sample.

The columns of the .XYD file are the x and y coordinates of pellets in the order

ULx ULy LLx LLy T1x T1y T2x T2y T3x T3y T4x T4y MNIx MNIy MNMx MNMy

where UL, LL, T1, T2, T3, T4, MNI, and MNM denote the upper lip, lower lip, tongue positions 1-4 (1 is closest to tip, 4 farthest back in the mouth), mandible incisor (lower front tooth), and mandible molar (lower back tooth) pellets.

The coordinates are in differentially encoded integer microns (10^-3 mm). The first line gives the time-zero sample values while subsequent lines give the differences from the previous line. This coding cuts the file size in half. This disk includes software to add a time stamp (microseconds from the start of data acquisition) in the first column, and to decode the xy coordinates for each of the pellets. This expanded .TXY file format is usable with any of a number of plot, spreadsheet, and stat software packages.

The x-ray tracking system can lose its lock on a pellet. Where this has occured, we have placed a ``bad data value'' (a very large number) in the x and y columns for that pellet. The bad data value is 1000000 (10^6 microns or 1 metre) in the .TXY file format. This value is well outside the useful field of the microbeam tracking system. The XY software tool provided on this disk automatically recognizes the bad data value, and blanks out pellets for portions where they are not correctly tracked.

The files PAL.DAT and PHA.DAT contain x and y columns of a sequence of points making out the hard palate and back pharyngeal wall outlines. These files are in ASCII, and the values are integer microns.