Measurement, Testing, & Scoring
Conformance - An Introduction
1
Goal
Goal: To review measurement, testing and scoring in FPWD along with the issues raised on the forward public working draft
Context: We will use alt text as the example to discuss the challenges today, but the broader conversation is about measuring
2
Terminology
3
Relevant Requirements
4
Atomic Tests vs. Holistic Tests (FPWD)
Atomic Tests
Atomic tests evaluate content, often at an object level, for accessibility. Atomic tests include the existing tests that support A, AA, and AAA success criteria in WCAG 2.X. They also include tests that may require additional context or expertise beyond tests that fit within the WCAG 2.X structure. In WCAG 3.0, atomic tests are used to test both processes and views. Test results are then aggregated across the selected views. Critical errors within selected processes are also totaled. Successful results of the atomic tests are used to reach a Bronze rating.
Atomic tests may be automated or manual. Automated evaluation can be completed without human assistance. These tests allow for a larger scope to be tested but automated evaluation alone cannot determine accessibility. Over time, the number of accessibility tests that can be automated is increasing, but manual testing is still required to evaluate most methods at this time.
Holistic Tests
Holistic tests include assistive technology testing, user-centered design methods, and both user and expert usability testing. Holistic testing applies to the entire declared scope and often uses the declared processes to guide the tests selected.
5
Atomic, Automated Tests for Functional Images (FPWD)
Procedure for HTML
Expected Results: Check #2 and #3 are true.
6
Procedure for Technology Agnostic
Expected Results: Checks #2 and #3, or #2 and #4, or #2 and #5 are true.
Unit Tested: All Images
Measurement: Percentage (# passed/total # of img elements for all images)
Atomic, Manual Tests for Functional Images (FPWD)
Procedure for HTML
Expected Results: Checks #1 and #2 are true (see notes)
7
Procedure for Technology Agnostic
Expected Results: Checks #2 and #3, or #2 and #4, or #2 and #5 are true.
Unit Tested:This test is measured by the number of the following elements only for “Functional Images” in the HTML document
Measurement: Percentage (# passed/total # of img elements for “Functional Images”)
Scoring "Text alternative available" (FPWD)
8
Rating | Criteria |
Rating 0 | Less than 60% of all images have appropriate text alternatives OR there is a critical error in the process |
Rating 1 | 60% - 69% of all images have appropriate text alternatives AND no critical errors in the process |
Rating 2 | 70%-79% of all images have appropriate text alternatives AND no critical errors in the process |
Rating 3 | 80%-94% of all images have appropriate text alternatives AND no critical errors in the process |
Rating 4 | 95% to 100% of all images have appropriate text alternatives AND no critical errors in the process |
Issues: Concern about Difficulty of Counting
9
Issues: Types of Measures
10
Issues: Need for Clarity
11
Overall Complexity/Need for Simplicity
12
Suggestions for Rating Scales
13
Joint Silver/ACT Work to Date
14
Example - Hamburger Menu
Quantitative Measure: Does it have alt text?
Qualitative Measure: Equivalent purpose of the image?
15
Example - Dogs
Quantitative: Does it have alt text?
Qualitative: Equivalent purpose of the image?
16
Example: Aggregate
17
Next Steps: Proposals for measuring/testing “A text alternative that serves the equivalent purpose.”
18