3 of 34

Research

In 2011, the W3C WAI Research and Development Working Group (RDWG) held an Online symposium on Web Accessibility Metrics. These metrics is this slide deck are based on the Research Report on Web Accessibility Metrics of 2012.

We welcome additional research and recommendations. We need more expert input.

4 of 34

Metrics

Validity - does the score reflect the accessibility of the product?
Reliability - is the score reproducible and consistent?
Sensitivity - does a change in the score reflect the change in the accessibility?
Adequacy - does a small change in accessibility create a small scoring change? (Are the formulas adjusted correctly?)
Complexity - does it take excessive time to test?

5 of 34

Resources for testing special purposes

6 of 34

Example Test Sites

Validity Testing Escape Room site - this a copy of a website that we deliberately modified to have specific flaws that we could test against.

Validity Testing Deque’s “Gefälscht” site - this is a site with errors maintained by Deque.

7 of 34

Example Test Login / Registration Form

We also created a series of basic web pages that could be used as a login / registration user path.
None of the forms on this page submit any data.
Each page lists its inaccessible content and code.
These pages can be used to test:

Text Alternatives;
Clear Words;
Structured Content;
Visual Contrast Of Text.

8 of 34

Example of a test App

Medicare iOS app

This is an example of testing a mobile app.

Anything else we want to say about it that is notable?

9 of 34

GOV UK’s Accessibility Testing Pages

In 2018, the UK’s Government Digital Services created a suite of pages to test automated accessibility testing tools.
We forked the GitHub repo to use for WCAG 3, and made all the tests available.
There are 142 test cases, including content that can be used to test First Public Working Draft proposals for:

Structured Content (headings);
Text Alternatives (images);
Visual Contrast Of Text (colour and contrast).
Note: there is one video file and one audio file in the tests, but they are missing captions and metadata respectively.

10 of 34

First Public Working Draft content

The First Public Working Draft has proposed guidelines for:

11 of 34

Testing Tools

Greater detail of tools to test WCAG 3 are on following slides, but for ease of access, they are listed here:

Silver Writer tool to test Clear Words;
APCA Contrast Calculator to test Visual Contrast Of Text.

12 of 34

New prototype testing tools

The following slides were developed for the First Public Working Draft. They are outdated. They are included for archival interest.

13 of 34

Testing Clear Words

To test Clear Words, you can use the Silver Writer tool.

This is a prototype tool to demonstrate the feasibility of scoring language.
We expect that accessibility tool makers will refine and improve this idea.
The Silver Writer tool is a fork of XKCD’s Simple Writer, which was created originally created by the author to help him write his Thing Explainer book, which resulted in the corpus being not representative of commonly used English words.
The app was re-created and open sourced, which we in turn forked to see if it could be used to assess WCAG 3 Clear Words scoring.

14 of 34

Silver Writer / Clear Words Tool

Because the original word list was very restrictive and only relevant to the author’s book, we added two extra lists that contained the top 3000 and 5000 words from Google’s Trillion Word Corpus.
Note: the Google lists have not been edited to remove not-safe-for-work content, which is present.
Note: the tool is a little buggy, but it works well enough to demonstrate scoring clear language is possible.

15 of 34

Using The Clear Words Tool

To use the Clear Words tool:

Type, or copy and paste, text into the “Enter Words Here” text input.
“Less Simple” words will be listed in the “Less Simple Words” section that will appear at the bottom of the page.
Duplicate words will be removed from the “Less Simple” list, but plurals are currently still counted as different words. There is currently no guidance on whether the plural of a word should be included in scoring calculations. We expect to have more guidance on this in a future draft.
The different language lists can be toggled using the Change Word List select element.

16 of 34

Testing Visual Contrast

Visual Contrast is a substantial change from WCAG 2.
Research can be found in the Silver Wiki.
There is voluminous information behind the change including a table of change from WCAG 2 to WCAG 3.
To test the new contrast algorithm, use the APCA Contrast Calculator tool, which is developed and maintained by Andy Somers, a contributor to WCAG 3. See Andy’s GitHub repo for additional information.

17 of 34

Spreadsheet

18 of 34

How to use the scoring spreadsheet

The following slides will walk through how to use the scoring spreadsheet to score a site.
You’ll need a copy of the blank spreadsheet (note: you will need a copy of Excel for this).

19 of 34

Start with the Tests sheet

The Tests sheet has columns for Method, Guideline, Outcome, Type Of Test, and assessment results for up to three views on a path.
There are currently two types of test result: Percentage and Rating Scale. We’ll look at one example of each.

20 of 34

Testing Text Alternatives For Images Of Text (HTML)

Start by reading the Outcome: Text Alternative Available document.
On that page, you’ll find a link to the Method: Images Of Text document, which contains detailed information, in the Tests tab, on how to test any images of text in the views on your paths.
The Method page tells you what the measurement calculation is. In this case it’s: Percentage (number passed ÷ total number of img elements for “Images of Text”).
If, after testing your first view, you had 5 images of text, 4 of which passed, and had no critical errors, you would have: (4 ÷ 5) × 100 = 80%, which is the value you enter into the “View 1 Score” cell.

21 of 34

Testing Clear Words

Start by reading the Outcome: Common Clear Words document.
On that page, you’ll find a link to the Method: Clear Words document which contains, in the Tests tab, detailed information on how to test the words in views on your paths.
Assess the words on your path using the Silver Writer tool (see other slides in this deck for more information on that tool).
Once you have assessed all of the text in your view, you can use the rating information in the Method page to assign a rating, for example: “Uses undefined technical or jargon words (Rating 0)”.
The rating value is the value you enter into the “View 1 Score” cell.

22 of 34

The Outcomes Sheet

Once you have tested your views, you can use the Outcomes tab to look at the scoring.
The values range from 0 (“very poor”) to 4 (“excellent”).
If you have a critical failure in your path then the score for that is automatically 0.
More details on scoring outcomes can be found in the WCAG 3 Scoring Outcomes content.

23 of 34

Example Outcomes Sheet

Outcome	Total Passed	Total Critical Failures	Score
Text alternative available	84%	2	0
Common Clear Words	0%	0	0
Translates speech and non-speech audio	4.0	0	4
Headings with levels organize content	100%	0	4
Use visually distinct headings	83%	0	3
Convey hierarchy with semantic structure	51%	0	0

This Outcomes Sheet example shows the results of testing the different outcomes.
Note that although Text Alternative Available had an 84% pass rate, because of two critical failures, that outcome was scored zero.

24 of 34

The Final Score sheet

Once all of the outcomes have been scored, the ratings are averaged for a total score and by the functional categories they support.
Conformance to Bronze Level must have no critical errors, a score of at least 3.5 in each functional category, and a total score of at least 3.5.
More details on this can be found WCAG 3 Overall Scores content.

25 of 34

Example Final Score Sheet

Functional Categories	Score
Vision and Visual	1.3
Hearing and Auditory	4.0
Sensory Intersections	2.0
Mobility	2.0
Motor	2.0
Physical and Sensory Intersections	2.0
Speech	0.0
Attention	1.5
Language and Literacy	2.0
Learning	2.8
Memory	1.5
Executive	3.0
Mental Health
Cognitive and Sensory Intersections	1.8
Total	1.8
Critical Failures	2

This Final Score Sheet example shows the scores broken down by functional categories, the total score, and the total number of critical failures.
With a score of 1.8 and critical failures, this path would not achieve a Bronze rating.

27 of 34

How to apply testing to WCAG3 Metrics?

By testing with known flaws, we can evaluate the metrics of WCAG3 itself.

Some examples of testing we can do are:

Validity: Test the known flaws. Change the known flaws (fix some) and then test again to see if the score changes appropriately based on an expert evaluation of the accessibility.
Reliability: Do different testers make the same scoring decisions and achieve the same overall score?

28 of 34

Testing the Reliability Metric

Reliability: Do different testers make the same scoring decisions and achieve the same overall score?

Have multiple people test the same digital products with different tools (varying browser plugin tools, accessibility testing apps, OS features, or assistive technology) and ensure that the combination of different tools provide similar scores.
Have multiple people test the same large website and see how the interpretation of the Representative sampling changes the score.
Have multiple people test the same digital products and see how the changing selections of path and off-path change the score.

29 of 34

Testing the Sensitivity Metric

Test with a set of pages with known accessibility problems. There would be different versions of the set with different levels of accessibility to see how the score changes.

For example, we would have a set where all the problems for screen readers were fixed, another where all the low vision and color contrast problems were fixed. This would help show us if we have successfully balanced the scoring so that all disabilities were treated fairly.

30 of 34

Testing the Adequacy Metric

Have two versions of the same digital product. One would have small changes in accessibility from the other.

We would test both and look to see that the change in score is proportionate to the change in accessibility.

31 of 34

Testing the Complexity Metric

Ask experts to test a site for WCAG2 and compare how long it took to test with WCAG3. Since WCAG3 currently only has 5 guidelines, test WCAG2 only for:

32 of 34

Feedback on Metrics

Validity - does the score reflect the accessibility of the product?
Reliability - is the score reproducible and consistent?
Sensitivity - does a change in the score reflect the change in the accessibility?
Adequacy - does a small change in accessibility create a small scoring change? (Are the formulas adjusted correctly?)
Complexity - does it take excessive time to test?

33 of 34

Please share your data

34 of 34

Questions?

Thanks for your interest.

Francis Storr�francis.storr@intel.com�Github (for W3C work): https://github.com/fstrr

Jeanne Spellman�jspellman@spellmanconsulting.com�Twitter & Github: @jspellman

1 of 34

2 of 34