1 of 16

🧑‍🔬 Question Builder Research Summary

STM002 | Updated April ’23

2 of 16

The Question’s team is currently redesigning the foundation and architecture of questions within Top Hat

2

This work aims to improve the questions experience for our users and engineering team and will allow us to eventually unlock STEM fields previously unavailable to us.

3 of 16

This project has 3 big (and equal) value propositions:

3

🚀 Dev Velocity

Usability

🧱 STEM Foundations

A new scalable and flexible questions architecture is needed to support our business.

Without investing in this, we cannot improve the experience with acceptable implementation timelines.

Features to support flexible STEM constructions are needed to author winning national titles.

We do not have these building blocks today and “Bolt-On” attempts in the current system fail to meet needs.

We want to be the easiest place to write and experience questions in higher-ed.

Our experience today is inconsistent and low-quality; lagging behind many of our peers.

4 of 16

Our main motivation behind this research was to de risk our assumptions about usability and learnability in the builder

4

Priority 3 - Ease of Access

Templates are the main way to use the builder.��Objective: If the tests show that users do in fact choose templates majority of the time, we can feel more confident in continuing to invest in templates and the template menu.

Priority 2 - Learnability (Templates)

Templates are a bridge from the old system to the new one.

Objective: By testing the extent to which templates explain or imply the concepts within the new building experience, we will identify potential gaps in learnability.

Priority 1 - Builder Usability

The flexible builder model is intuitive and easily learnable.

Objective: If the task success rate is high we will de-risk key assumptions we’ve made about how users want to build questions. If the task success rate is low we will revisit the areas causing difficulty to improve the design.

Instantaneous adoption & understanding was not our goal. We hypothesized there would be a learning curve which we considered acceptable & expected.

Our aim through this research was to understand that curve and use our learnings to optimize the onboarding process.

5 of 16

We spoke to 6 professors from diverse fields of study to answer the question: Is the builder fundamentally usable and learnable?

5

Religious Studies

Business Law

Theology

Math & Data Science

Biology

Civil Engineering

Methodology

The study was conducted as a series of remote moderated user tests.

  • 6 Participants
  • 45 min sessions

Participants were asked to

  • Create an multiple choice question using prefabricated content
  • Create a numeric question using prefabricated content
  • Re-create the numeric question starting from a blank canvas

Diversifying field of study allowed us to avoid over representing fields

50/50 experience - only half of the participants had experience creating questions in Top Hat. This helped us to understand if there were any biases carried over from our existing question creation system.

6 of 16

Our new rich-text-editor-style builder is flexible by design which posed unique challenges during testing

6

6

User Clicks “Advanced”

Selects Incorrect Template

Copy & Pastes Content

Types Content

Premise

Answer/option

Hint/Expln.

Premise

Answer/option

Hint/Expln.

Selects Correct Template

Copy & Pastes Content

Types Content

Premise

Answer/option

Hint/Expln.

Premise

Answer/option

Hint/Expln.

Selects Blank Template

Copy & Pastes Content

Types Content

Premise

Answer/option

Hint/Expln.

Premise

Answer/option

  • Necessary to streamline & control test

Prefabricated Content

  • Unnatural user experience
  • User blindly follows content structure
  • Difficult to see natural preferences (i.e. question structuring, keyboard nav. etc.)

Rich Text Editors have countless acceptable paths to the same result. Some tradeoffs were required to ensure tests produced usable results.

  • Kept tests focused despite complexity

Strategically Minimize # of Tasks

  • Impossible to test all possibilities (blind spots)
  • Task configuration lead to bias (i.e. One question type repeated twice)

7 of 16

Testing a live feature with infinite acceptable paths required careful documentation of limitations and assumptions

7

The question builder is a WIP feature in a live and constantly changing environment which meant bugs during testing were inevitable.

Pausing work while testing was not feasible so to mitigate participant errors and mistakes we:

  • Documented known issues & incomplete or missing features and explicitly excluded those items from our metrics
  • Prioritized our JIRA backlog focusing on high-risk tickets that would cause task failure

8 of 16

🔍

Metrics & Findings

9 of 16

We measured usability, learnability, and ease of access using three key metrics, the first was task success rate

9

Task

Success Rate

Lower Bound

Success Rate

Higher Bound

Task 1

30%

91%

Task 2

64%

100%

Task 3

64%

100%

Benchmark

≥ 50%

≥ 78%

Success rate was determined by three factors

  • Task Completion
  • Error Rate
  • 0 Errors on Critical Subtasks

We hypothesized that success rate would help determine:

1. Usability (rate for task 1)

2. Learnability (increased success tasks 2 & 3)�

While task 1 success rate was low, 100% of all participants improved significantly on subsequent tasks.

10 of 16

Successful completion of tasks was not sufficient to determine usability or learnability so we coupled it with Time on Task

10

Time on Task was used to determine whether the user could create a question in reasonable time & if they became faster over time.

Task

Time

Lower Bound

Time

Higher Bound

Task 1

01:00

05:00

Task 2

00:40

02:00

Task 3

00:30

02:10

Benchmark

≤ 02:50

The benchmark for reasonable time to create a question was determined using Keystroke Level Modeling

11 of 16

We then used a single ease question to gauge participant’s feelings about the builder’s ease of use

11

Each participant was asked to rate how easy each task was on a scale from 1 (extremely easy) to 7 (extremely difficult).

We hypothesized that there would be an increase in ease of use from task 1 to task 2 (templates), with a slight decline in task 3 (blank builder).

When asked to explain their SEQ selection the answer from 5/6 users was that once they learned the system task 2 was simple.

Task

SEQ

Lower Bound

SEQ

Higher Bound

Task 1

4.5

8.2

Task 2

4.9

7.7

Task 3

4.2

7.2

Benchmark

≥ 3.8

≥ 5.5

12 of 16

Two strong signals displayed by 50% - 100% of participants came through in our testing

12

Setting Correctness

  • Users had difficulty finding the toggle, or did not understand it when they found it (5/6)
  • Users wanted us to bring back the correctness checkbox for MCQ (4/6)
  • Users wanted us to “just show” correctness toggle or felt it was hard to get to (3/6)

🔍 Looking for Feedback

  • Users were looking for more feedback and guidance to learn the builder 5/6

13 of 16

While some signals weren’t as strong or actionable, there were some interesting additional insights out of the tests

13

Different User Styles

Hints

Other Small Fixes

Participants had vastly different styles and comfort levels when it came to experimenting with new systems. We noticed two user types:

“Experimenters” dove in head first and felt comfortable making mistakes and recovering.

“Cautious Avoiders” exhibited anxiety about making mistakes. Instinctually delete their work and start again instead of using undo.

  • Clarify how & when hints are shown (1/6)
  • Clear indication of what you’re deleting [numeric] (1/6)
  • Setting points in MCQ per node or block (2/6)
  • Hierarchy/organization in template menu caused confusion (2/6)
  • Code formatting in questions (1/6)
  • 4/6 participants expressed interest in using hints in their own courses
  • Participants were interested in features like; manual hint requests, multiple hints, and using hints to gamify course content
  • None of the participants in the study currently use Top Hat hints. One user was unaware the feature already exists

14 of 16

🏗

Improvements

15 of 16

Team Apollo is already actively working on solutions for the issues surfaced during user testing

15

The team has already begun to implement some of these changes including the return of the correctness checkbox for MCQs.

We’ve also explored some early ideas to surface contextual menus more clearly. Our hypothesis is that providing more tactile feedback will encourage users to click in and find the menus.

The team is exploring ways to introduce the new builder and guide users through onboarding.

16 of 16

Resources

16