Measuring the User Experience

Measuring the User Experience:

Collecting, Analyzing, and Presenting Usability Metrics

by Tom Tullis and Bill Albert

Table of Contents

1. Introduction

1.1 Organization of this Book

1.2 What Is Usability?

1.3 Why Does Usability Matter?

1.4 What Are Usability Metrics?

1.5 The Value of Usability Metrics

1.6 Ten Common Myths about Usability Metrics

Myth #1. Metrics take too much time to collect
Myth #2. Usability metrics cost too much money
Myth #3. Usability metrics are not useful when focusing on small improvements
Myth #4. Usability metrics don't help us understand causes
Myth #5. Usability data are too noisy
Myth #6. You can just trust your gut
Myth #7. Metrics don't apply to new products
Myth #8. No metrics exist for the type of issues we are dealing with
Myth #9. Metrics are not understood or appreciated by management
Myth #10. It's difficult to collect reliable data with a small sample size

2. Background

2.1 Designing a Usability Study

2.1.1 Selecting Participants
2.1.2 Sample Size
2.1.3 Within-Subjects or Between-Subjects Study
2.1.4 Counterbalancing
2.1.5 Independent and Dependent Variables

2.2 Types of Data

2.2.1 Nominal Data
2.2.2 Ordinal Data
2.2.3 Interval Data
2.2.4 Ratio Data
2.2.5 Aggregate and Disaggregate Data

2.3 Metrics and Data

2.4 Descriptive Statistics

2.4.1 Measures of Central Tendency
2.4.2 Measures of Variability
2.4.3 Standard Error
2.4.4 Confidence Intervals

2.5 Comparing Means

2.5.1 Independent Samples
2.5.2 Paired-Samples
2.5.3 Comparing More than Two Samples

2.6 Relationships between Variables

2.6.1 Correlations

2.7 Nonparametric Tests

2.7.1 Chi-square Test

2.8 Presenting Your Data Graphically

2.8.1 Column or Bar Graphs
2.8.2 Line Graphs
2.8.3 Scatterplots
2.8.4 Pie Charts
2.8.5 Stacked Bar Graphs

2.9 Summary

3. Planning a Usability Study

3.1 Study Goals

3.1.1 Formative Usability
3.1.2 Summative Usability

3.2 User Goals

3.2.1 Performance
3.2.2 Satisfaction

3.3 Choosing the Right Metrics: Ten Types of Usability Studies

3.3.1 Completing a Transaction
3.3.2 Comparing Products
3.3.3 Evaluating Frequent Use of the Same Product
3.3.4 Evaluating Navigation and/or Information Architecture
3.3.5 Increasing Awareness
3.3.6 Problem Discovery
3.3.7 Maximizing Usability for a Critical Product
3.3.8 Creating an Overall Positive User Experience
3.3.9 Evaluating the Impact of Subtle Changes
3.3.10 Comparing Designs

3.4 Other Study Details

3.4.1 Budgets and Timelines
3.4.2. Evaluation Methods
3.4.3 Participants
3.4.4 Data Collection
3.4.5 Data Cleanup

3.5 Summary

4. Performance Metrics

4.1 Task Success

4.1.1 Collecting any Type of Success Metric
4.1.2 Binary Success
4.1.3 Levels of Success
4.1.4 Issues in Measuring Success

4.2 Time-on-Task

4.2.1 Importance of Measuring Time-on-Task
4.2.2 How to Collect and Measure Time-on-Task
4.2.3 Analyzing & Presenting Time-on-Task Data
4.2.4 Issues to Consider when Using Time Data

4.3 Errors

4.3.1 When to Measure Errors
4.3.2 What Constitutes an Error?
4.3.3 Collecting and Measuring Errors
4.3.4 Analyzing and Presenting Errors
4.3.5 Issues to Consider When Using Error Metrics

4.4 Efficiency

4.4.1 Collecting and Measuring Efficiency
4.4.2 Analyzing and Presenting Efficiency Data
4.4.3 Efficiency as a Combination of Task Success and Time

4.5 Learnability

4.5.1 Collecting and Measuring Learnability Data
4.5.2 Analyzing and Presenting Learnability Data
4.5.3 Issues to Consider When Measuring Learnability

4.6 Summary

5. Issues-Based Metrics

5.1 Identifying Usability Issues

5.2 What Is a Usability Issue?

5.2.1 Which Issues Are Real versus False?

5.3 How to Identify an Issue

5.3.1 In-Person Studies
5.3.2 Automated Studies
5.3.3 When Issues Begin and End
5.3.4 How Granular Should Issues Be?
5.3.5 Multiple Observers

5.4 Severity Ratings

5.4.1 Severity Ratings Based on the User Experience
5.4.2 Severity Ratings Based on a Combination of Factors
5.4.3 Using a Severity Rating System
5.4.4 Some Caveats on Severity Ratings

5.5 Analyzing and Reporting Metrics for Usability Issues

5.5.1 Frequency of Unique Issues
5.5.2 Frequency of Issues per Participant
5.5.3 Frequency of Participants
5.5.4 Issues by Category
5.5.5 Issues by Task
5.5.6 Reporting Positive Issues

5.6 Consistency in Identifying Usability Issues

5.7 Bias in Identifying Usability Issues

5.8 Number of Participants

5.8.1 Five Participants is Enough
5.8.2 Five Participants is Not Enough
5.8.3 Our Recommendation

5.9 Summary

6. Self-Reported Metrics

6.1 Importance of Self-Reported Data

6.2 Collecting Self-Reported Data

6.2.1 Likert Scales
6.2.2 Semantic Differential Scales
6.2.3 When to Collect Self-Reported Data
6.2.4 How to Collect Self-Reported Data
6.2.5 Biases in Collecting Self-Reported Data
6.2.6 General Guidelines for Rating Scales
6.2.7 Analyzing Self-Reported Data

6.3 Post-task Ratings

6.3.1 Ease of Use
6.3.2 After-Scenario Questionnaire
6.3.3 Expectation Measure
6.3.4 Usability Magnitude Estimation
6.3.5 A Comparison of Post-task Self-Reported Metrics

6.4 Post-session Ratings

6.4.1 Aggregate Individual Task Ratings
6.4.2 System Usability Scale (SUS)
6.4.3 Computer System Usability Questionnaire (CSUQ)
6.4.4 Questionnaire for User Interface Satisfaction (QUIS)
6.4.5 Usefulness, Satisfaction, and Ease of Use Questionnaire
6.4.6 Product Reaction Cards
6.4.7 A Comparison of Post-session Self-Reported Metrics

6.5 Examples of Using SUS to Compare Designs

6.5.1 A Comparison of "Senior-Friendly" Websites
6.5.2 A Comparison of Windows ME and Windows XP
6.5.3 A Comparison of Paper Ballots

6.6 Online Services

6.6.1 Website Analysis and Measurement Inventory (WAMMI)
6.6.2 American Customer Satisfaction Index (ACSI)
6.6.3 OpinionLab
6.6.4 Issues with Live-Site Surveys

6.7 Other Types of Self-Reported Metrics

6.7.1 Assessing Specific Attributes
6.7.2 Assessing Specific Elements
6.7.3 Open-Ended Questions
6.7.4 Awareness and Comprehension
6.7.5 Awareness and Usefulness Gaps

6.8 Summary

7. Behavioral and Physiological Metrics

7.1 Observing and Coding Overt Behaviors

7.1.1 Verbal Behaviors
7.1.2 Nonverbal Behaviors

7.2 Behaviors Requiring Equipment to Capture

7.2.1 Facial Expressions
7.2.2 Eye-Tracking
7.2.3 Pupillary Response
7.2.4 Skin Conductance and Heart Rate
7.2.5 Other Measures

7.3 Summary

8. Combined and Comparative Metrics

8.1 Single Usability Scores

8.1.1 Combining Metrics Based on Target Goals
8.1.2 Combining Metrics Based on Percentages
8.1.3 Combining Metrics Based on Z-Scores
8.1.4 Using SUM: Single Usability Metric

8.2 Usability Scorecards

8.3 Comparison to Goals and Expert Performance

8.3.1 Comparison to Goals
8.3.2 Comparison to Expert Performance

8.4 Summary

9. Special Topics

9.1 Live Website Data

9.1.1 Server Logs
9.1.2 Click-Through Rates
9.1.3 Dropoff Rates
9.1.4 A/B Studies

9.2 Card-Sorting Data

9.2.1 Analyses of Open Card-Sort Data
9.2.2 Analyses of Closed Card-Sort Data

9.3 Accessibility Data

9.4 Return on Investment (ROI) Data

9.5 Six Sigma

9.6 Summary

10. Case Studies

10.1 Redesigning a Website Cheaply and Quickly By Hoa Loranger

10.1.1 Phase 1: Testing Competitor Websites
10.1.2 Phase 2: Testing Three Different Design Concepts
10.1.3 Phase 3: Testing a Single Design
10.1.4 Conclusion
10.1.5 Biography

10.2 Usability Evaluation of a Speech Recognition IVR By James R. Lewis

10.2.1 Method
10.2.2 Results-Task-Level Measurements
10.2.3 PSSUQ
10.2.4 Participant Comments
10.2.5 Usability Problems
10.2.6 Adequacy of Sample Size
10.2.7 Recommendations Based on Participant Behaviors and Comments
10.2.8 Discussion
10.2.9 Biography
10.2.10 References

10.3 Redesign of the CDC.gov Website By Robert Bailey, Cari Wolfson, and Janice Nall

10.3.1 Usability Testing Levels
10.3.2 Baseline Test
10.3.3 Task Scenarios
10.3.4 Qualitative Findings
10.3.5 Wireframing and "First Click" Testing
10.3.6 Final Prototype Testing (Prelaunch Test)
10.3.7 Conclusions
10.3.8 Biographies
10.3.9 References

10.4 Usability Benchmarking Case Study: Mobile Music and Video By Scott Weiss and Chris Whitby

10.4.1 Project Goals and Methods
10.4.2 Qualitative and Quantitative Data
10.4.3 Research Domain
10.4.4 Comparative Analysis
10.4.5 Study Operations: Number of Respondents
10.4.6 Respondent Recruiting
10.4.7 Data Collection
10.4.8 Time to Complete
10.4.9 Success or Failure
10.4.10 Number of Attempts
10.4.11 Perception Metrics
10.4.12 Qualitative Findings
10.4.13 Quantitative Findings
10.4.14 Summary Findings and SUM Metrics
10.4.15 Data Manipulation and Visualization
10.4.16 Discussion
10.4.17 Benchmark Changes and Future Work
10.4.18 Biographies
10.4.19 References

10.5 Measuring the Effects of Drug Label Design and Similarity on Pharmacists' Performance By Agnieszka (Aga) Bojko

10.5.1 Participants
10.5.2 Apparatus
10.5.3 Stimuli
10.5.4 Procedure
10.5.5 Analysis
10.5.6 Results and Discussion
10.5.7 Biography
10.5.8 References

10.6 Making Metrics Matter By Todd Zazelenchuk

10.6.1 OneStart: Indiana University's Enterprise Portal Project
10.6.2 Designing and Conducting the Study
10.6.3 Analyzing and Interpreting the Results
10.6.4 Sharing the Findings and Recommendations
10.6.5 Reflecting on the Impact
10.6.6 Conclusion
10.6.7 Acknowledgments
10.6.8 Biography
10.6.9 References

11. Moving Forward

11.1 Sell Usability and the Power of Metrics

11.2 Start Small and Work Your Way Up the Metrics Ladder

11.3 Make Sure You Have the Time and Money to Do the Job Right

11.4 Plan Early and Often

11.5 Benchmark Your Products

11.6 Explore Your Data

11.7 Speak the Language of Business

11.8 Show Your Confidence

11.9 Don't Misuse Metrics

11.10 Simplify Your Presentation

References

Index