Title of Dissertation: | Speech Input in Multimodal Environments: Effects of Perceptual Structure on Speed, Accuracy, and Acceptance |
Name of Candidate: | Michael A. Grasso Doctor of Philosophy, 1997 |
Dissertation and Abstract Approved: | Dr. Timothy W. Finin Professor Computer Science and Electrical Engineering |
Dissertation and Abstract Approved: | Dr. David S. Ebert Assistant Professor Computer Science and Electrical Engineering |
Date Approved: | May 12, 1997 |
Degree and Date to be Conferred: Ph.D., 1997
Professional Publications:
Michael A. Grasso, David Ebert, Tim Finin. The Effect of Perceptual Structure on Multimodal Speech Recognition Interfaces. ACM Transactions on Computer-Human Interaction, under review.
Michael A. Grasso, David Ebert, Tim Finin. Acceptance of a Speech Interface for Biomedical Data Collection. 1997 AMIA Annual Fall Symposium, under review.
Michael A. Grasso and Tim Finin. Task Integration in Multimodal Speech Recognition Environments. Crossroads, 3(3):19-22, Spring 1997.
Michael A. Grasso. Speech Input in Multimodal Environments: A Proposal to Study the Effects of Reference Visibility, Reference Number, and Task Integration. Technical Report TR CS-96-09, University of Maryland Baltimore County, Department of Computer Science and Electrical Engineering, 1996.
Michael A. Grasso. Automated Speech Recognition in Medical Applications. M.D. Computing, 12(1):16-23, 1995.
Michael A. Grasso and Clare T. Grasso. Feasibility Study of Voice-Driven Data Collection in Animal Drug Toxicology Studies. Computers in Biology and Medicine, 24:4:289-294, 1994.
Professional Positions Held:
1988 - Present | President/Senior Computer Scientist. Segue Biomedical Computing, Laurel, Maryland. |
1992 - Present | Instructor of Computer Science, Part-Time. University of Maryland Baltimore County, Department of Continuing Education. |
1991 - 1993 | Instructor of Computer Science, Part-Time. Howard Community College, Columbia, Maryland. |
1987 - 1988 | Senior Programmer/Analyst. Program Resources, Inc., Annapolis, Maryland. |
1984 - 1985 | Microbiology Technician and Computer Programmer. Johns Hopkins University, Baltimore, Maryland. |
1981 - 1984 | Medical Technologist and Microbiology Technician. Part-time positions and internships held while completing undergraduate education. |
Title of Dissertation: | Speech Input in Multimodal Environments: Effects of Perceptual Structure on Speed, Accuracy, and Acceptance |
|
Michael A. Grasso, Doctor of Philosophy, 1997 | ||
Dissertation Directed By: | Dr. Timothy W. Finin, Professor Computer Science and Electrical Engineering Dr. David S. Ebert, Assistant Professor |
A framework of complementary behavior has been identified which maintains that direct manipulation and speech interface modalities have reciprocal strengths and weaknesses. This suggests that user interface performance and acceptance may increase by adopting a multimodal approach that combines speech and direct manipulation. Based on this concept and the theory of perceptual structures, this work examined the hypothesis that the speed, accuracy, and acceptance of a multimodal speech and direct manipulation interface would increase when the modalities match the perceptual structure of the input attributes.
A software prototype to collect histopathology data was developed with two interfaces to test this hypothesis. The first interface used speech and direct manipulation in a way that did not match the perceptual structure of the attributes, while the second interface used speech and direct manipulation in a way that best matched the perceptual structure. A group of 20 clinical and veterinary pathologists evaluated the prototype in an experimental setting using repeating measures. The independent variables were interface order and task order, and the dependent variables were task completion time, speech errors, mouse errors, diagnosis errors, and user acceptance.
The results of this experiment support the hypothesis that the perceptual structure of an input task is an important consideration when designing multimodal computer interfaces. Task completion time improved by 22.5%, speech errors were reduced by 36%, and user acceptance increased 6.7% with the computer interface that best matched the perceptual structure of the input attributes. Mouse errors increased slightly and diagnosis errors decreased slightly, but these were not statistically significant. There was no relationship between user acceptance and time, suggesting that speed is not the predominate factor in determining approval. User acceptance was related to speech recognition errors, suggesting that recognition accuracy is critical to user satisfaction. User acceptance was also shown to be related to domain errors, suggesting that the more domain expertise a person has, the more he or she will embrace the computer interface.
Michael A. Grasso
Dissertation submitted to the faculty of the Graduate School
of the University of Maryland in partial fulfillment
of the requirements of the degree of
Doctor of Philosophy
1997
(c) Copyright Michael A. Grasso 1997
Several people provided technical assistance. Judy Fetters from the National Center for Toxicological Research (NCTR) consulted with me on the software prototype and in the selection of tissue slides. Alan Warbritton from NCTR scanned the slides for the prototype. Lowell Groninger, Greg Trafton, and Clare Grasso helped with the experiment design and statistical analysis. The support staff at Speech Systems, Inc. helped with grammar development and answered countless technical questions.
Finally, thanks to those who graciously participated in this study from the University of Maryland Medical Center, the Baltimore Veteran Affairs Medical Center, the Johns Hopkins Medical Institutions, and the Food and Drug Administration. Special thanks to Dr. John Strandberg, Dr. Michael Lipsky, and Dr. Jules Berman for helping to identify participants.