APPROVAL SHEET

Title of Dissertation: Speech Input in Multimodal Environments: 
Effects of Perceptual Structure on Speed, Accuracy, 
and Acceptance 
Name of Candidate: Michael A. Grasso
Doctor of Philosophy, 1997
Dissertation and Abstract Approved: Dr. Timothy W. Finin
Professor
Computer Science and Electrical Engineering 
Dissertation and Abstract Approved: Dr. David S. Ebert
Assistant Professor
Computer Science and Electrical Engineering 
Date Approved: May 12, 1997

 


Curriculum Vitae

Name: Michael A. Grasso

Degree and Date to be Conferred: Ph.D., 1997

Professional Publications:

Michael A. Grasso, David Ebert, Tim Finin. The Effect of Perceptual Structure on Multimodal Speech Recognition Interfaces. ACM Transactions on Computer-Human Interaction, under review.

Michael A. Grasso, David Ebert, Tim Finin. Acceptance of a Speech Interface for Biomedical Data Collection. 1997 AMIA Annual Fall Symposium, under review.

Michael A. Grasso and Tim Finin. Task Integration in Multimodal Speech Recognition Environments. Crossroads, 3(3):19-22, Spring 1997.

Michael A. Grasso. Speech Input in Multimodal Environments: A Proposal to Study the Effects of Reference Visibility, Reference Number, and Task Integration. Technical Report TR CS-96-09, University of Maryland Baltimore County, Department of Computer Science and Electrical Engineering, 1996.

Michael A. Grasso. Automated Speech Recognition in Medical Applications. M.D. Computing, 12(1):16-23, 1995.

Michael A. Grasso and Clare T. Grasso. Feasibility Study of Voice-Driven Data Collection in Animal Drug Toxicology Studies. Computers in Biology and Medicine, 24:4:289-294, 1994.

Professional Positions Held:
 

1988 - Present  President/Senior Computer Scientist.
Segue Biomedical Computing, Laurel, Maryland. 
1992 - Present  Instructor of Computer Science, Part-Time.
University of Maryland Baltimore County, Department of Continuing Education.
1991 - 1993  Instructor of Computer Science, Part-Time.
Howard Community College, Columbia, Maryland.
1987 - 1988  Senior Programmer/Analyst.
Program Resources, Inc., Annapolis, Maryland. 
1984 - 1985  Microbiology Technician and Computer Programmer.
Johns Hopkins University, Baltimore, Maryland. 
1981 - 1984  Medical Technologist and Microbiology Technician.
Part-time positions and internships held while completing undergraduate education.

 


Abstract

Title of Dissertation: Speech Input in Multimodal Environments:
Effects of Perceptual Structure on Speed, Accuracy,
and Acceptance
Michael A. Grasso, Doctor of Philosophy, 1997
Dissertation Directed By:  Dr. Timothy W. Finin, Professor
Computer Science and Electrical Engineering 

Dr. David S. Ebert, Assistant Professor
Computer Science and Electrical Engineering

A framework of complementary behavior has been identified which maintains that direct manipulation and speech interface modalities have reciprocal strengths and weaknesses. This suggests that user interface performance and acceptance may increase by adopting a multimodal approach that combines speech and direct manipulation. Based on this concept and the theory of perceptual structures, this work examined the hypothesis that the speed, accuracy, and acceptance of a multimodal speech and direct manipulation interface would increase when the modalities match the perceptual structure of the input attributes.

A software prototype to collect histopathology data was developed with two interfaces to test this hypothesis. The first interface used speech and direct manipulation in a way that did not match the perceptual structure of the attributes, while the second interface used speech and direct manipulation in a way that best matched the perceptual structure. A group of 20 clinical and veterinary pathologists evaluated the prototype in an experimental setting using repeating measures. The independent variables were interface order and task order, and the dependent variables were task completion time, speech errors, mouse errors, diagnosis errors, and user acceptance.

The results of this experiment support the hypothesis that the perceptual structure of an input task is an important consideration when designing multimodal computer interfaces. Task completion time improved by 22.5%, speech errors were reduced by 36%, and user acceptance increased 6.7% with the computer interface that best matched the perceptual structure of the input attributes. Mouse errors increased slightly and diagnosis errors decreased slightly, but these were not statistically significant. There was no relationship between user acceptance and time, suggesting that speed is not the predominate factor in determining approval. User acceptance was related to speech recognition errors, suggesting that recognition accuracy is critical to user satisfaction. User acceptance was also shown to be related to domain errors, suggesting that the more domain expertise a person has, the more he or she will embrace the computer interface.

 


Speech Input in Multimodal Environments:

Effects of Perceptual Structure

on Speed, Accuracy, and Acceptance

by

Michael A. Grasso

Dissertation submitted to the faculty of the Graduate School
of the University of Maryland in partial fulfillment
of the requirements of the degree of
Doctor of Philosophy
1997

(c) Copyright Michael A. Grasso 1997

 


Dedication

To my parents, Silvio and Angela Grasso, who taught me to dream, and to my wife, Clare, for believing.

 


Acknowledgment

The completion of this dissertation was made possible through the support and cooperation of many individuals. Thanks to my advisors, Timothy W. Finin and David S. Ebert, who provided thoughtful guidance and encouragement through what seemed to be a never-ending process. Thanks also to Tulay Adali, Charles K. Nicholas, and Anthony F. Norcio, the other members of my committee, for helping to me to understand the significance of this research with respect to medical informatics, software engineering, and human-computer interaction, respectively.

Several people provided technical assistance. Judy Fetters from the National Center for Toxicological Research (NCTR) consulted with me on the software prototype and in the selection of tissue slides. Alan Warbritton from NCTR scanned the slides for the prototype. Lowell Groninger, Greg Trafton, and Clare Grasso helped with the experiment design and statistical analysis. The support staff at Speech Systems, Inc. helped with grammar development and answered countless technical questions.

Finally, thanks to those who graciously participated in this study from the University of Maryland Medical Center, the Baltimore Veteran Affairs Medical Center, the Johns Hopkins Medical Institutions, and the Food and Drug Administration. Special thanks to Dr. John Strandberg, Dr. Michael Lipsky, and Dr. Jules Berman for helping to identify participants.

 


Table of Contents

1. Introduction
1.1 Speech Recognition Systems
1.1.1 Historical Perspective
1.1.2 Speaker Dependence
1.1.3 Continuity of Speech
1.1.4 Vocabulary Size
1.1.5 Human Factors of Speech Interfaces
1.2 Direct Manipulation
1.3 The Problem
1.4 Significance of this Study
1.5 Research Questions
2. Literature Survey
2.1 Multimodal Speech Recognition Interfaces
2.1.1 Multimodal Access to the World-Wide Web
2.1.2 Integrated Multimodal Interface
2.1.3 Multimodal Window Navigation
2.2 Reference Attributes
2.2.1 Reference Visibility
2.2.2 Vocabulary Size
2.3 Multimodal Input Tasks
2.3.1 Theory of Perceptual Structures
2.3.2 Integrality of Input Devices
2.3.3 Integrating Input Modalities
2.4 Motivations of Speech in Medical Informatics
2.4.1 Template-Based Reporting
2.4.2 Natural Language Processing
2.4.3 Speech in Multimodal Environments
2.4.4 Hands-Busy Data Collection
2.5 Data Collection in Animal Toxicology Studies
2.6 Preliminary Work
2.6.1 Materials
2.6.2 Methods
2.6.3 Results and Discussion
2.6.4 Conclusion
3. Methodology
3.1 Independent Variables
3.2 Dependent Variables
3.3 Subjects
3.4 Procedure
3.5 Materials
3.6 Statistical Analysis
3.7 Schedule and Deliverables
4. Experimental Results
4.1 Task Completion Times
4.2 Errors
4.3 Acceptability
4.4 Correlation
5. Discussion and Conclusion
5.1 Findings
5.2 Relationships
5.2.1 Baseline Interface versus Perceptually Structured Interface
5.2.2 Relationships to Task Completion Time
5.2.3 Relationships with Acceptability Index
5.3 Summary
5.4 Future Research Directions
5.5 Conclusion
6. Appendices
6.1 Sample Memorandum to Request for Volunteers
6.2 Pre-Experiment Questionnaire
6.3 Post-Experiment Questionnaire
6.4 Pathology Nomenclature
6.5 Perceptually Structured Interface Vocabulary
6.6 Baseline Interface Vocabulary
6.7 Perceptually Structured Interface Transcript
6.8 Baseline Interface Transcript
6.9 Task Completion Time Scores
6.10 Speech Errors
6.11 Mouse Errors
6.12 Diagnosis Errors
6.13 Acceptability Scores
7. References

 

 

 


List of Tables

Table 1: Complementary Strengths of Direct Manipulation and Speech
Table 2: Proposed Applications for Direct Manipulation and Speech
Table 3: Reference Attributes and Interface Tasks
Table 4: Integral and Separable Input Attributes
Table 5: Ratio of Written to Total Input
Table 6: Contrastive Pattern of Modality Use
Table 7: Predicted Modalities for Computer-Human Interface Improvements
Table 8: Possible Interface Combinations for the Software Prototype
Table 9: Adjective Pairs used in the User Acceptance Survey
Table 10: Subject Demographics
Table 11: Subject Groupings for the Experiment
Table 12: Tissue Slide Diagnoses
Table 13: Experimental Procedure
Table 14: Research Schedule
Table 15: Deliverables
Table 16: Times for the Baseline and Perceptually Structured Interfaces
Table 17: ANOVA for Baseline and Perceptually Structured Interfaces
Table 18: Single Factor ANOVA for Baseline Groups
Table 19: Single Factor ANOVA for Perceptually Structured Groups
Table 20: Single Factor ANOVA for Slide Group 1 Groups
Table 21: Single Factor ANOVA for Slide Group 2 Groups
Table 22: Baseline and Perceptually Structured Error Rates
Table 23: Two-Factor ANOVA for AI
Table 24: Pearson Correlation Coefficients for Dependent Variables

 


List Of Figures

Figure 1: Synergistic versus Integrated Interface Tasks
Figure 2: Sample Data Entry Screen
Figure 3: Comparison of Mean Task Completion Times
Figure 4: Comparison of Mean Errors
Figure 5: Comparison of Acceptability Index by Question
Figure 6: No Correlation Between Time and Acceptability Index
Figure 7: Correlation Between Average AI and Total Speech Errors
Figure 8: Correlation Between Average AI and Total Diagnosis Errors

Go to next section.