information Society Technology

Scope and Sampling of resources

Definitions

Within UWEM, conformance claims need to refer to a list of resources evaluated within the scope of the web site(s). This section provides the background definitions to the different concepts used.

Resource:
a network data object identified by a URI [RFC3986]. This definition is adapted from the definition of resource in [RFC2616]. This concept is included for non-HTTP resources like, e.g., those accessed via the FTP protocol. This type of resource must be expressed via an instance of the earl:WebContent Class (see Appendix C and [EARL10-Schema] for further details).
HTTP Resource:
a network data object identified by a single HTTP request. This type of resource will be expressed via the RDF Schema [RDFS] of the [HTTP-RDF] W3C Note. This distinction on the resources is due to the underlying complexity of the HTTP protocol [RFC2616], where content negotiation can lead to different versions of a resource (e.g., language versions via the Accept-Language HTTP header).
Resources list:
Conformance claims in UWEM are related to a given resources list, which is expressed as a sequence of resources (of any type). Appendix C describes the RDF syntax used to express a resources list. According to the needs of different applications of UWEM, this resources list may be specified by a variety of different participants in the evaluation process - such as a site owner, a site operator, an inspection organisation, etc. This document only explains how such a list should be unambiguously expressed.

Procedure to express the scope

For the purposes of the UWEM a Web site is defined as an arbitrary collection of hyperlinked Web resources, each identified accordingly to the procedure described in section 4.1.

The purpose of UWEM is to guarantee replicability of results. Therefore, it is of key importance for the aggregation and comparison of results, the unambiguous expression of tested resources. Therefore, it will not be accepted as UWEM 1.0 conformance claims blanket statements of the type "http://example.org/ is conformant to UWEM 1.0 Level 1." This will imply that a set of "seed" resources have been crawled to the end following certain pre-determined limits or constraints. However, bearing in mind, the wide variety of existing crawlers, and the different technologies that they use, it is not possible to verify the reliability of those statements. Furthermore, the different RFCs related to Domain Names leave open room for interpretation in regard to the concept of subdomain and its resolution.

Therefore, for UWEM 1.0 conformance claims, the scope of a Web site MUST be expressed in the form of a list of resources (see: Appendix C).

Procedure to generate evaluation samples

In general it will not be practical to test all site resources against all evaluation criteria. Accordingly, after determining and disclosing a list of resources to be evaluated and the targeted conformance level, we propose to identify certain subsets or "samples".

The resources to sample should include the Core Resource List supplemented with a selection of arbitrary resources. We call this the Sampled Resource List.

The Core Resource List

The Core Resource List is a set of generic resources, which are likely to be present in most Web sites, and which are core to the use and accessibility evaluation of a site. The Core Resource List therefore represents a set of resources which should be included in any accessibility evaluation of the site. The Core Resource List cannot, in general, be automatically identified, but requires human judgement to select. In case of completely automatic testing like, e.g., in an observatory, the Core Resource List may be determined via some heuristic methods. The Core Resource List should consist of as many of the following resources as are applicable:

Of course, any single resource may belong to more than one of the categories above: the requirement is simply that the Core Resource List as a whole should, as far as possible, collectively address all the applicable sampling objectives.Any given resource should appear only once in the Core Resource List.

The Sampled Resource List

A Sampled Resource List is a set, which can be generated by automatic recursive crawling from a set of "seed" resources. A Sampled Resource List would typically be used in the context of evaluations carried out over large numbers of sites (against automatic criteria only), where it is not feasible or necessary to evaluate the complete set of web pages for each site.1 If a sampled approach is used, then the sampled result must be representative and unbiased, which means that it must be a random sub-set of the total number of resources. The Sampled Resource List for large scale automatic evaluation should therefore use a sampling algorithm that samples the resource set using a random uniform, or near-random uniform sampling algorithm [HENZINGER00], or a random set of samples from the complete set of web pages (provided that the complete set of web pages is available)2. The UWEM aggregation method in: Aggregation model for test results operates on web page level, so each sample unit should resemble the set of web resources that together form a rendered web page.

The error margin of the 95% confidence interval of the mean value of the aggregated samples for a web site, using the UWEM aggregation method in section 6 should be clearly denoted in the test results. It is up to the tool vendor whether they choose to present the error margin for each web site, or if they choose to perform sampling to a given error margin,3 so that the maximum error margin is presented once.4

The error margin m of a confidence interval is defined to be the value added or subtracted from the sample mean which determines the length of the interval:

m = z σ n m=z %sigma over sqrt{n}

Where z=1.96 for 95% confidence interval and σ%sigma is the standard deviation of the aggregated samples for the web site using the aggregation method in: Aggregation model for test results.

Note that both the sampling algorithm used, and any further restrictions limitating or biasing the result, including, but not limited to the set of restrictions below, should be explicitly disclosed in any evaluation report:

Manual sample size

As an alternative to the method described above, and especially suited for expert testing, we allow a manual selection of the minimum number of resources. This minimum number of resources in the Sampled Resource List, depends on the estimated web site size.

The minimum sample size consists of 30 unique resources (if available), adding 2 unique resources per 1000 up to a maximum of 50 resources in the Sampled Resource List. This is an arbitrary number. More detailed recommendations for sample sizes will be added in a later version of the UWEM and will be largely based on the results from experiments within the EIAO and BenToWeb projects.

  1. Note that if a small web site is evaluated entirely, then the mean value of the aggregated samples can be calculated exactly. For larger web sites, it is tolerable to sample a random sub-set of the web pages, as long as the error margin of the 95% confidence interval is disclosed.
  2. As long as the algorithm used selects a random, unbiased set of web pages, then the sample valid, and should provide the same results within the calculated error margin for a 95% confidence interval.
  3. Sample until error margin is achieved is based on the fact that an increase in sample size will decrease the length of the confidence interval without reducing the level of confidence. This is because the standard deviation of the sample mean decreases as n increases.
  4. Note that is also is possible to determine a minimum number of samples that will provide results that are within a given error margin, even in worst case, with a variance of 0.5. However this will require more samples than strictly necessary for all web sites that are better (less variance) than the worst case. With this in mind, we believe it is better to have a requirement on telling the error margin of the result, than a requirement on the number of samples.