1 of 12

Metrics for Testing WCAG3

Data metrics and resources

2 of 12

Research

Metrics and plan for evaluating WCAG3

In 2011, the W3C WAI Research and Development Working Group (RDWG) held an Online symposium on Web Accessibility Metrics. These metrics is this slide deck are based on the Research Report on Web Accessibility Metrics of 2012.

We welcome additional research and recommendations. We need more expert input.

3 of 12

Metrics

  • Validity - does the score reflect the accessibility of the product?
  • Reliability - is the score reproducible and consistent?
  • Sensitivity - does a change in the score reflect the change in the accessibility?
  • Adequacy - does a small change in accessibility create a small scoring change? (Are the formulas adjusted correctly?)
  • Complexity - does it take excessive time to test?

4 of 12

Validity

This is probably the most important metric to study in any conformance proposal. Does the score that a test web site or digital product achieve in any proposed scoring actually reflect the score that the digital product should get?

  1. Creating web pages with known accessibility problems and testing them against each scoring system to see if the final score reflects the real accessibility of the site based on the expertise of the testers.
  2. Testing known products against each scoring system to see if the final score reflects the real accessibility of the site based on the expertise of the testers.
  3. Using external accessibility experts to independently evaluate digital products (web sites, mobile, PDFs, apps, digital publishing, etc.) using the scoring system and get their feedback on the validity of the scoring system. This would be after publication of the First Public Working Draft (FPWD).

5 of 12

Reliability

  • This attribute is related to the reproducibility and consistency of scores i.e. the extent to which they are the same when evaluations of the same resources are carried out in different contexts (different tools, different people, different goals, different time). This would be particularly useful to ensure that similar results are achieved by different testers. It would also be useful to see if different testers would select the same path or off-path decisions. Representative sampling tests also fit in this category.

  • Test using external accessibility experts to independently evaluate digital products (web sites, mobile, PDFs, apps, digital publishing, etc.) using the scoring system and get their feedback on the validity of the scoring system. This would be after publication of the First Public Working Draft (FPWD).

6 of 12

Sensitivity

This metric is useful for determining if the conformance proposal captures the impact of the severity of accessibility barriers on the final score and if different disabilities are treated equally by the proposal.

Test with a set of pages with known accessibility problems. There would be different versions of the set with different levels of accessibility to see how the score changes. For example, we would have a set where all the problems for screen readers were fixed, another where all the low vision and color contrast problems were fixed. This would help show us if we have successfully balanced the scoring so that all disabilities were treated fairly.

7 of 12

Adequacy

Adequacy describes if the formulas being used to process and score the testing results are using such a small interval that small changes in accessibility cause large changes in scoring.

  1. Have two versions of the same digital product. One would have small changes in accessibility from the other. We would test both and look to see that the change in score is proportionate to the change in accessibility.
  2. Have multiple people test the same large website and see how the interpretation of the Representative sampling changes the score.
  3. Have multiple people test the same digital products and see how the changing selections of path and off-path change the score.

8 of 12

Complexity

Complexity refers to the resources required to accomplish the conformance testing. These could be crawler time, or time for human judgment testing. This would be a useful metric to have to answer the question of how much time WCAG3 takes to test as compared to WCAG 2.x.

  • Ask for external help as part of FPWD. We can ask experts to run a test on a site where they know how long it took to test with WCAG2 and compare how long it took to test with WCAG3. We will need to compensate for lack of familiarity and tools, but that can be worked out.

9 of 12

Example Test Sites

Validity Testing Escape Room site - this a copy of a website that we deliberately modified to have specific flaws that we could test against.

Validity Testing Deque’s “Gefälscht” site - this is a site with errors maintained by Deque.

10 of 12

Example Test Login / Registration Form

  • We also created a series of basic web pages that could be used as a login / registration user path.
  • None of the forms on this page submit any data.
  • Each page lists its inaccessible content and code.
  • These pages can be used to test:
    • Text Alternatives;
    • Clear Words;
    • Structured Content;
    • Visual Contrast Of Text.

11 of 12

Example of a test App

Medicare iOS app

This is an example of testing a mobile app.

12 of 12

GOV UK’s Accessibility Testing Pages