Michael A. Grasso - Ph.D. Dissertation

Dissertation Title

Speech Input in Multimodal Environments:
Effects of Perceptual Structure on Speed, Accuracy, and Acceptance


speech recognition, direct manipulation, human-computer interaction, medical informatics, pathology 

Download/View Dissertation

In Adobe Acrobat format (434 KB)
Annotated Bibliography
Histopathology Slides

Dissertation Committee and Laboratory

Dr. David S. Ebert, Co-chair
Dr. Timothy W. Finin, Co-chair
Dr. Tulay Adali
Dr. Charles K. Nicholas
Dr. Anthony F. Norcio
Graphics Animation and Visualization Laboratory
Laboratory for Advanced Information Technology


A framework of complementary behavior has been identified which maintains that direct manipulation and speech interface modalities have reciprocal strengths and weaknesses. This suggests that user interface performance and acceptance may increase by adopting a multimodal approach that combines speech and direct manipulation. Based on this concept and the theory of perceptual structures, this work examined the hypothesis that the speed, accuracy, and acceptance of a multimodal speech and direct manipulation interface would increase when the modalities match the perceptual structure of the input attributes.

A software prototype to collect histopathology data was developed with two interfaces to test this hypothesis. The first interface used speech and direct manipulation in a way that did not match the perceptual structure of the attributes, while the second interface used speech and direct manipulation in a way that best matched the perceptual structure. A group of 20 clinical and veterinary pathologists evaluated the prototype in an experimental setting using repeating measures. The independent variables were interface order and task order, and the dependent variables were task completion time, speech errors, mouse errors, diagnosis errors, and user acceptance.

The results of this experiment support the hypothesis that the perceptual structure of an input task is an important consideration when designing multimodal computer interfaces. Task completion time improved by 22.5%, speech errors were reduced by 36%, and user acceptance increased 6.7% with the computer interface that best matched the perceptual structure of the input attributes. Mouse errors increased slightly and diagnosis errors decreased slightly, but these were not statistically significant. There was no relationship between user acceptance and time, suggesting that speed is not the predominate factor in determining approval. User acceptance was related to speech recognition errors, suggesting that recognition accuracy is critical to user satisfaction. User acceptance was also shown to be related to domain errors, suggesting that the more domain expertise a person has, the more he or she will embrace the computer interface.