Aggregation model for test results
Introduction
For automatic and/or expert evaluation it is important that the individual assessments can be aggregated into an end-result that is easy to understand and has a clear interpretation. Aggregation of test results is possible on different levels like the checkpoint or test level.
One of the metrics that can also be interesting for policy makers, is the accessibility barrier probability. The European Commission regulation 808/2004 concerning community statistics for the information society explicitly states that one characteristic to be provided is barriers to the use of ICT, Internet and other electronic networks, e-commerce and e-business processes. This section describes a model for calculating the accessibility barrier probability for single Web pages and Web sites.1
Approach
Web accessibility evaluation can be viewed as a three stage process. Figure 2 and Figure 3 summarise the Web accessibility evaluation process. The notation in the figures that will be used throughout this section is introduced in: Definitions and mathematical background .
In the first stage, W3C's Evaluation and Report Language (EARL) is used as a standardised format for collecting and conveying test results from accessibility assessment tools according to any given standard. The model also supports the weighting of the test reports according to their error probability such that the contribution of tests with lower confidence to the overall result is reduced.
The second stage performs aggregation of the individual test results into one comprehensive figure for a web page. The calculation is based on a statistical model (user centric accessibility barrier model, UCAB, introduced in: Mathematical background: The UCAB model ). The underlying assumption being that the accessibility barriers within a web page accumulate.
To present the results to the public, e.g., people who experience barriers (such as people with disabilities) or users of the data for planning and development purposes (such as policy makers and stakeholders) the third stage of Web accessibility evaluation needs to provide means for interpreting the results. This includes statistical analyses of the findings, presentation of average values for a Web site or for groups of Web sites by geographical region or business sector. This stage can be viewed as the “business logic” for estimating the accessibility barriers. The reporting of findings will be covered in Reporting of test results and Scorecard report.
Definitions and mathematical background
In this section we explain the main concepts involved in the modelling of Web accessibility barriers. Subsequently we show how they can be transferred into a statistical model and introduce the UWEM User Centric Accessibility Barrier Model (UCAB).
Definitions
- Web page
-
A resource on the web as defined by section 4.
- Barrier
-
An accessibility barrier is modelled as a product failure caused by an incompatibility between the needs of a disabled user and product functionality. The incompatibility is caused by the web page, i.e., it is not the user's fault.
- Barrier type
-
A barrier type is related to a test procedure (as described in: Tests for conformance evaluation ) and provides a unique interpretation of its result. For example, a non-text element can constitute a barrier in several ways. One way is described by barrier type 1.1_HTML_01 ("alt attribute is missing").
Barrier types are an integral part of barrier modelling because they have two important properties:
- Barrier types can be measured objectively with reproducible results.
- The results of the measurements can be aggregated.
Additionally it is assumed that each barrier gives the same result when the same element is being checked more than once, so that the same barrier type will be reported every time, and therefore also the same barrier probability will apply every time the element is being tested.
- Accessibility
-
Under the scope of this section, accessibility is defined as the absence of barriers within the Sampled Resource List.
Notation
The following notation is used to refer to the quantities that are involved in the calculations.
The results from each test in stage one are given by a report where is a page and a barrier type. means that the test for barrier type failed, whereas means that the test for barrier type passed. For expert evaluation these results will usually be given in a tabular test report.
Automatic evaluation has the capacity to assess all elements within the page . In this case the result can also be given as the ratio of the number of failed tests to the number of all relevant elements for barrier type .
If the confidence level of a test procedure is known the reports can be adapted to reflect this. Let denote the probability that the test for yields a false positive and the probability that the test for yields a false negative 2. The following report includes confidence weighting:
If ratio reports are used the calculation changes to
The barrier probability of barrier type is given by 3.The barrier probabilities constitute a fixed set of parameters. A small value of indicates that the probability that a disabled user encountering a barrier of type will experience an accessibility problem is small. All parameters are set to the same value for a start. The parameters are being validated and will be tuned within EIAO with regard to automatic evaluation for future versions of UWEM.
The result of the aggregation is interpreted as accessibility barrier probability: is the probability that the Web page constitutes an accessibility barrier for a disabled user.
Example (Confidence weighting)
The goal is to estimate the proportion of images that don't have an appropriate alternative text. The inspected web page contains ten image elements with alternative text. The (stage one) evaluation reports that for three of the images the alternative text is not appropriate.
The barrier ratio is.
Suppose that the probability of false negatives (i.e. false "fail" results) is small because it is relatively easy to recognise suspicious image descriptions like file names or place holder text: . On the other hand the probability of false positives (i.e., false "pass" results) is higher because sometimes the whole page context needs to be taken into account to determine whether the description is appropriate: .
The estimate is adapted accordingly:
Mathematical background: The UCAB model
The main purpose of the aggregation of the reports from stage one is to model the experience of a disabled user trying to access a Web site. The goal is to determine the probability that user can not complete a task because of the accessibility barriers they encounter. Depending on the severity the barriers might either stop the disabled used right away or prevent the completion of the task because too much time and effort are required. To address this goal we introduce the User Centric Accessibility Barrier Model (UCAB).
There a two main statistical assumptions underlying the UCAB model:
- Independence of barrier occurrences
- Each test passes or fails independently of each other one. (i.e. the reports used as random variables are mutually independent.)
- Barriers within a Web page accumulate
- Each barrier that a disabled user encounters within a Web page reduces the overall accessibility, i.e. increases the accessibility barrier probability . This is modelled as probability that the user encounters any barrier within the web page.
Let A and B be two independent events then the probability that A or B occurs is given by:
where denotes the probability and is the complementary event of .
Stage 2: Accessibility Barrier Probability Fp
An accessibility barrier probability is modelled as a product failure caused by an incompatibility between a disabled user's need and product functionality. It is assumed that a failure mode reported by a test procedure will introduce an accessibility barrier with some known probability .
Application of the UCAB model yields the following formula (notation as described in section notation).
Example (Fp for single Web page):
The (stage one) evaluation of a Web page yielded reports that were generated by three different tests:
-
b0 = 1.1_HTML_01
-
b1 = 2.2_HTML_02
-
b2 = 9.3_HTML_03
In detail, the reports might look like this:
- (images without alt attribute)
- (the colour contrast is ok)
- (event handler ondblclick has been used)
Assuming that the barrier probabilities of the failure modes have the values , the accessibility barrier probability of the Web page calculated from the UCAB model is:
Combining results from different testing procedures
It is possible to combine the results from evaluations performed with different tools or by different experts if the following conditions apply:
-
No double reports (No barrier should be included more than once in the aggregation. To meet this the evaluation has to observe a division of the tests that are performed, e.g. into different sets of Web pages or into automatic and expert evaluation).
-
Same sample (The reports have to cover the same data, i.e. the same version and selection of Web resources).
Stage 3: Accessibility Barrier Probability Fs
The Web page accessibility barrier probabilities naturally lend themselves to aggregation, so that the average barrier probability and variance for a Web site can be calculated from the sampled Web pages for a Web site. Similarly, aggregation can be further performed over several Web sites, regions or countries.
The barrier probabilityf or a Web site is calculated as the mean of the barrier probabilities of the Web pages that have been sampled from the Web site.
where is the number of Web pages sampled from Web site and is the barrier probability for Web page – calculated as described in section 6.4.
The standard deviation is given by
Example (Fs for single Web site)
Two pages have been sampled from Web site. The barrier probabilities have been calculated in stage 2:
Then the average barrier probability of is
The standard deviation is
Limitations of the UCAB model
Clearly we are assuming an idealised Web in this model, and some of the assumptions may not be true for a real Web site. The evaluation phase of UWEM will be used to verify if the model is usable, and also to improve the model where necessary.
Aspects not covered by the model
-
Information redundancy
Information can be presented in several ways on the Web, and in some cases there may be sufficient redundant information, so that what looks like a barrier, is not perceived as a barrier by the disabled user due to information redundancy (e.g., a title can compensate for a missing alt attribute in some specific scenarios). The model does not consider information redundancy.
-
User behaviour
User behaviour is not modelled yet. It is conceivable that we could model the time it takes until a user successfully performs the given task as a stochastic variable. The integral of the probability distribution function could then be used to calculate the probabilities that a disabled user would be able to finish a task within given time. It could also be used to model when most users would give up due to frustration.
-
Complexity of Web page
All Web pages are treated equally. The complexity (e.g., size or the number of test locations that have been identified) is not taken in to account. However, it seems unnatural that a large Web page has a higher barrier probability than a smaller page with the same barrier density.
Our current approach is justifiable as long as we assume that the sampling algorithm collects samples that are uniformly distributed and comparable.
Underlying assumptions
-
Ideal results from stage 1
-
The statistical model assumes that the results from stage 1 are correct, i.e., that no false reports are included in the calculation. To improve the robustness of the model it might be useful to introduce some kind of error analysis when implementing it.
-
Independence of barrier occurrences
The different elements of a Web site are not totally independent, thus a model including these dependencies may be more precise. This will in practice introduce a bias for this model.
If several barriers of the same type are present within a Web page it should not be assumed that they contribute to the result in the same way as barriers of different types.
-
Independence of barrier types
Some barrier types are probably related to each other (for example, tests for checkpoint 13.3 and 13.4). If several tests are applied to the same element the applicability constraints are designed to ensure that each problem is counted only once. In this way most of the dependencies between the barrier types can be avoided.
-
Constant barrier probabilities Fb
The methodology assumes that barriers can be categorised into a fixed number of barrier types. Each time a barrier of type occurs within a Web page it has the same severity (i.e. it contributes the same barrier probability to the calculation).
- This probabilistic model for accessibility barriers is work in progress. It will be evaluated and refined if necessary during the evaluation phases of UWEM. Aggregation of accessibility barriers will probably be based around these general ideas, but the exact model is subject to change during the evaluation phase.
- This version of UWEM does not specify how the parameters should be selected. Instead the default valueswill be used, Note that with this parameter selection the results are the same as without confidence weighting.
- The accessibility barrier probability Fb for a barrier type b can be estimated via user testing using a representative set of users. It may also be estimated to some extent with expert testing or semi-automatic testing, however the level of precision in detecting real barriers will be less than for user testing. In this simple first approach the barrier probability is set to a fixed value. In later versions this parameter could be used to provide a finer grading of the severity of the barrier types, e.g. by introducing different values for different disability groups.

