🧑🔬 Question Builder Research Summary
STM002 | Updated April ’23
The Question’s team is currently redesigning the foundation and architecture of questions within Top Hat
2
This work aims to improve the questions experience for our users and engineering team and will allow us to eventually unlock STEM fields previously unavailable to us.
This project has 3 big (and equal) value propositions:
3
🚀 Dev Velocity
✨ Usability
🧱 STEM Foundations
A new scalable and flexible questions architecture is needed to support our business.
Without investing in this, we cannot improve the experience with acceptable implementation timelines.
Features to support flexible STEM constructions are needed to author winning national titles.
We do not have these building blocks today and “Bolt-On” attempts in the current system fail to meet needs.
We want to be the easiest place to write and experience questions in higher-ed.
Our experience today is inconsistent and low-quality; lagging behind many of our peers.
Our main motivation behind this research was to de risk our assumptions about usability and learnability in the builder
4
Priority 3 - Ease of Access
Templates are the main way to use the builder.��Objective: If the tests show that users do in fact choose templates majority of the time, we can feel more confident in continuing to invest in templates and the template menu.
Priority 2 - Learnability (Templates)
Templates are a bridge from the old system to the new one.
Objective: By testing the extent to which templates explain or imply the concepts within the new building experience, we will identify potential gaps in learnability.
Priority 1 - Builder Usability
The flexible builder model is intuitive and easily learnable.
Objective: If the task success rate is high we will de-risk key assumptions we’ve made about how users want to build questions. If the task success rate is low we will revisit the areas causing difficulty to improve the design.
Instantaneous adoption & understanding was not our goal. We hypothesized there would be a learning curve which we considered acceptable & expected.
Our aim through this research was to understand that curve and use our learnings to optimize the onboarding process.
We spoke to 6 professors from diverse fields of study to answer the question: Is the builder fundamentally usable and learnable?
5
Religious Studies
Business Law
Theology
Math & Data Science
Biology
Civil Engineering
Methodology
The study was conducted as a series of remote moderated user tests.
Participants were asked to
Diversifying field of study allowed us to avoid over representing fields
50/50 experience - only half of the participants had experience creating questions in Top Hat. This helped us to understand if there were any biases carried over from our existing question creation system.
Our new rich-text-editor-style builder is flexible by design which posed unique challenges during testing
6
6
User Clicks “Advanced”
Selects Incorrect Template
Copy & Pastes Content
Types Content
Premise
Answer/option
Hint/Expln.
Premise
Answer/option
Hint/Expln.
Selects Correct Template
Copy & Pastes Content
Types Content
Premise
Answer/option
Hint/Expln.
Premise
Answer/option
Hint/Expln.
Selects Blank Template
Copy & Pastes Content
Types Content
Premise
Answer/option
Hint/Expln.
Premise
Answer/option
Prefabricated Content
Rich Text Editors have countless acceptable paths to the same result. Some tradeoffs were required to ensure tests produced usable results.
Strategically Minimize # of Tasks
Testing a live feature with infinite acceptable paths required careful documentation of limitations and assumptions
7
The question builder is a WIP feature in a live and constantly changing environment which meant bugs during testing were inevitable.
Pausing work while testing was not feasible so to mitigate participant errors and mistakes we:
🔍
Metrics & Findings
We measured usability, learnability, and ease of access using three key metrics, the first was task success rate
9
Task | Success Rate Lower Bound | Success Rate Higher Bound |
Task 1 | 30% | 91% |
Task 2 | 64% | 100% |
Task 3 | 64% | 100% |
Benchmark | ≥ 50% | ≥ 78% |
Success rate was determined by three factors
We hypothesized that success rate would help determine:
1. Usability (rate for task 1)
2. Learnability (increased success tasks 2 & 3)�
While task 1 success rate was low, 100% of all participants improved significantly on subsequent tasks.
Successful completion of tasks was not sufficient to determine usability or learnability so we coupled it with Time on Task
10
Time on Task was used to determine whether the user could create a question in reasonable time & if they became faster over time.
Task | Time Lower Bound | Time Higher Bound |
Task 1 | 01:00 | 05:00 |
Task 2 | 00:40 | 02:00 |
Task 3 | 00:30 | 02:10 |
Benchmark | ≤ 02:50 | |
The benchmark for reasonable time to create a question was determined using Keystroke Level Modeling
We then used a single ease question to gauge participant’s feelings about the builder’s ease of use
11
Each participant was asked to rate how easy each task was on a scale from 1 (extremely easy) to 7 (extremely difficult).
We hypothesized that there would be an increase in ease of use from task 1 to task 2 (templates), with a slight decline in task 3 (blank builder).
When asked to explain their SEQ selection the answer from 5/6 users was that once they learned the system task 2 was simple.
Task | SEQ Lower Bound | SEQ Higher Bound |
Task 1 | 4.5 | 8.2 |
Task 2 | 4.9 | 7.7 |
Task 3 | 4.2 | 7.2 |
Benchmark | ≥ 3.8 | ≥ 5.5 |
Two strong signals displayed by 50% - 100% of participants came through in our testing
12
✅ Setting Correctness
🔍 Looking for Feedback
While some signals weren’t as strong or actionable, there were some interesting additional insights out of the tests
13
Different User Styles
Hints
Other Small Fixes
Participants had vastly different styles and comfort levels when it came to experimenting with new systems. We noticed two user types:
“Experimenters” dove in head first and felt comfortable making mistakes and recovering.
“Cautious Avoiders” exhibited anxiety about making mistakes. Instinctually delete their work and start again instead of using undo.
🏗
Improvements
Team Apollo is already actively working on solutions for the issues surfaced during user testing
15
The team has already begun to implement some of these changes including the return of the correctness checkbox for MCQs.
We’ve also explored some early ideas to surface contextual menus more clearly. Our hypothesis is that providing more tactile feedback will encourage users to click in and find the menus.
The team is exploring ways to introduce the new builder and guide users through onboarding.
Resources
16