3 of 16

This project has 3 big (and equal) value propositions:

🚀 Dev Velocity

✨ Usability

🧱 STEM Foundations

A new scalable and flexible questions architecture is needed to support our business.

Without investing in this, we cannot improve the experience with acceptable implementation timelines.

Features to support flexible STEM constructions are needed to author winning national titles.

We do not have these building blocks today and “Bolt-On” attempts in the current system fail to meet needs.

We want to be the easiest place to write and experience questions in higher-ed.

Our experience today is inconsistent and low-quality; lagging behind many of our peers.

4 of 16

Our main motivation behind this research was to de risk our assumptions about usability and learnability in the builder

Priority 3 - Ease of Access

Templates are the main way to use the builder.��Objective: If the tests show that users do in fact choose templates majority of the time, we can feel more confident in continuing to invest in templates and the template menu.

Priority 2 - Learnability (Templates)

Templates are a bridge from the old system to the new one.

Objective: By testing the extent to which templates explain or imply the concepts within the new building experience, we will identify potential gaps in learnability.

Priority 1 - Builder Usability

The flexible builder model is intuitive and easily learnable.

Objective: If the task success rate is high we will de-risk key assumptions we’ve made about how users want to build questions. If the task success rate is low we will revisit the areas causing difficulty to improve the design.

Instantaneous adoption & understanding was not our goal. We hypothesized there would be a learning curve which we considered acceptable & expected.

Our aim through this research was to understand that curve and use our learnings to optimize the onboarding process.

5 of 16

We spoke to 6 professors from diverse fields of study to answer the question: Is the builder fundamentally usable and learnable?

Religious Studies

Business Law

Theology

Math & Data Science

Biology

Civil Engineering

Methodology

The study was conducted as a series of remote moderated user tests.

6 Participants
45 min sessions

Participants were asked to

Create an multiple choice question using prefabricated content
Create a numeric question using prefabricated content
Re-create the numeric question starting from a blank canvas

Diversifying field of study allowed us to avoid over representing fields

50/50 experience - only half of the participants had experience creating questions in Top Hat. This helped us to understand if there were any biases carried over from our existing question creation system.

6 of 16

Our new rich-text-editor-style builder is flexible by design which posed unique challenges during testing

User Clicks “Advanced”

Selects Incorrect Template

Copy & Pastes Content

Types Content

Premise

Answer/option

Hint/Expln.

Premise

Answer/option

Hint/Expln.

Selects Correct Template

Copy & Pastes Content

Types Content

Premise

Answer/option

Hint/Expln.

Premise

Answer/option

Hint/Expln.

Selects Blank Template

Copy & Pastes Content

Types Content

Premise

Answer/option

Hint/Expln.

Premise

Answer/option

Necessary to streamline & control test

Prefabricated Content

Unnatural user experience
User blindly follows content structure
Difficult to see natural preferences (i.e. question structuring, keyboard nav. etc.)

Rich Text Editors have countless acceptable paths to the same result. Some tradeoffs were required to ensure tests produced usable results.

Kept tests focused despite complexity

Strategically Minimize # of Tasks

Impossible to test all possibilities (blind spots)
Task configuration lead to bias (i.e. One question type repeated twice)

7 of 16

Testing a live feature with infinite acceptable paths required careful documentation of limitations and assumptions

The question builder is a WIP feature in a live and constantly changing environment which meant bugs during testing were inevitable.

Pausing work while testing was not feasible so to mitigate participant errors and mistakes we:

Documented known issues & incomplete or missing features and explicitly excluded those items from our metrics
Prioritized our JIRA backlog focusing on high-risk tickets that would cause task failure

8 of 16

🔍

Metrics & Findings

9 of 16

We measured usability, learnability, and ease of access using three key metrics, the first was task success rate

Task	Success Rate Lower Bound	Success Rate Higher Bound
Task 1	30%	91%
Task 2	64%	100%
Task 3	64%	100%
Benchmark	≥ 50%	≥ 78%

Success rate was determined by three factors

Task Completion
Error Rate
0 Errors on Critical Subtasks�

We hypothesized that success rate would help determine:

1. Usability (rate for task 1)

2. Learnability (increased success tasks 2 & 3)�

While task 1 success rate was low, 100% of all participants improved significantly on subsequent tasks.

10 of 16

Successful completion of tasks was not sufficient to determine usability or learnability so we coupled it with Time on Task

Time on Task was used to determine whether the user could create a question in reasonable time & if they became faster over time.

Task	Time Lower Bound	Time Higher Bound
Task 1	01:00	05:00
Task 2	00:40	02:00
Task 3	00:30	02:10
Benchmark	≤ 02:50

The benchmark for reasonable time to create a question was determined using Keystroke Level Modeling

11 of 16

We then used a single ease question to gauge participant’s feelings about the builder’s ease of use

Each participant was asked to rate how easy each task was on a scale from 1 (extremely easy) to 7 (extremely difficult).

We hypothesized that there would be an increase in ease of use from task 1 to task 2 (templates), with a slight decline in task 3 (blank builder).

When asked to explain their SEQ selection the answer from 5/6 users was that once they learned the system task 2 was simple.

Task	SEQ Lower Bound	SEQ Higher Bound
Task 1	4.5	8.2
Task 2	4.9	7.7
Task 3	4.2	7.2
Benchmark	≥ 3.8	≥ 5.5

12 of 16

Two strong signals displayed by 50% - 100% of participants came through in our testing

✅ Setting Correctness

Users had difficulty finding the toggle, or did not understand it when they found it (5/6)
Users wanted us to bring back the correctness checkbox for MCQ (4/6)
Users wanted us to “just show” correctness toggle or felt it was hard to get to (3/6)

🔍 Looking for Feedback

Users were looking for more feedback and guidance to learn the builder 5/6

13 of 16

While some signals weren’t as strong or actionable, there were some interesting additional insights out of the tests

Different User Styles

Hints

Other Small Fixes

Participants had vastly different styles and comfort levels when it came to experimenting with new systems. We noticed two user types:

“Experimenters” dove in head first and felt comfortable making mistakes and recovering.

“Cautious Avoiders” exhibited anxiety about making mistakes. Instinctually delete their work and start again instead of using undo.

Clarify how & when hints are shown (1/6)
Clear indication of what you’re deleting [numeric] (1/6)
Setting points in MCQ per node or block (2/6)
Hierarchy/organization in template menu caused confusion (2/6)
Code formatting in questions (1/6)

4/6 participants expressed interest in using hints in their own courses
Participants were interested in features like; manual hint requests, multiple hints, and using hints to gamify course content
None of the participants in the study currently use Top Hat hints. One user was unaware the feature already exists

14 of 16

🏗

Improvements

15 of 16

Team Apollo is already actively working on solutions for the issues surfaced during user testing

The team has already begun to implement some of these changes including the return of the correctness checkbox for MCQs.

We’ve also explored some early ideas to surface contextual menus more clearly. Our hypothesis is that providing more tactile feedback will encourage users to click in and find the menus.

The team is exploring ways to introduce the new builder and guide users through onboarding.

1 of 16

2 of 16

3 of 16

4 of 16

5 of 16

6 of 16

7 of 16

8 of 16

9 of 16

10 of 16

11 of 16

12 of 16

13 of 16

14 of 16

15 of 16

16 of 16