Visual Field Assessment and Disability Evaluation (2025)

Chapter: 4 Evaluating New Perimetry Techniques

Previous Chapter: 3 Current and Emerging Practice in Visual Field Testing
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

4

Evaluating New Perimetry Techniques

As noted previously, perimetry techniques consist of hardware, stimuli, testing patterns, and algorithms in combination; one of the components cannot be assessed without the others being specified. In evaluating a perimetry technique, the key consideration is not just how measurements are obtained but whether the technique demonstrates acceptable performance for the relevant task. This chapter first presents considerations for the design of studies for assessing the accuracy of diagnostic tests, then provides an in-depth discussion of performance considerations for assessing new perimetry techniques. The chapter concludes with a discussion of the question from the committee’s statement of task on the quantity and characteristics of validation studies needed to find a perimeter acceptable.

DESIGN OF STUDIES FOR ASSESSING DIAGNOSTIC TEST ACCURACY1

Determining whether an individual meets the Social Security Administration’s (SSA’s) criteria for disability benefit eligibility is fundamentally a classification task, aligning with recognized principles for the design and evaluation of studies of diagnostic tests, as set forth in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Deeks et al., 2023). The design of such a study fundamentally involves recruiting as participants a single group of individuals, all of whom are suspected of

___________________

1 The committee relied on the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Deeks et al., 2023) for much of the content of this section.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

having the condition of interest. It is essential for this group to be representative of and drawn from the target population or intended-use population in the relevant setting (e.g., primary care, specialty care). Otherwise, spectrum bias can occur—that is, the study participants may not adequately represent the full range (“spectrum”) of patients one would expect to see in the intended setting. Thus, the study population could overrepresent some groups (e.g., those with more severe forms of the condition of interest) or underrepresent others (e.g., those whose condition is in an early stage or is of a milder form). Spectrum bias can then lead to either overestimation or underestimation of the real-world performance of the diagnostic test whose accuracy is being assessed.

In such a study, the test being assessed—referred to as the index test—is administered to each of the study participants. This index test may involve a new device (e.g., a tablet-based device or one that employs a virtual reality headset), or the use of a new testing algorithm with an existing device, or a combination of a new device and a new algorithm or other component(s) of a perimetry technique. Shortly thereafter, participants undergo a different test termed the reference standard. The reference standard is usually the best clinical method available for determining whether patients have the target condition (hence it is sometimes called the “gold standard,” although it is seldom 100 percent accurate, particularly in the case of visual field testing). The reference standard used most frequently in visual field testing is the perimetry technique most commonly used in clinical practice today—a 24-2 testing pattern on the Humphrey Field Analyzer using a Goldmann size III (0.43-degree diameter) stimulus and the Swedish Interactive Threshold Algorithm (SITA)—although other reference standards have also been used (Table 4-1).

Once participants have undergone both the index test and the reference standard, the two sets of results are compared for each participant. Those comparisons are then aggregated to form an estimate of the accuracy (e.g., sensitivity and/or specificity) of the index test in the intended setting and for the target population. This comparison is not, by itself, sufficient evidence to accept a new diagnostic test; other factors such as reproducibility must also be considered (see below). In fact, test–retest variability (i.e., reproducibility) may affect the amount of intertest variability found in an individual study. To examine this, researchers occasionally examine test–retest variability for both the index test and the reference standard. More commonly, however, they will simply compare against values for test–retest variability found in the literature.

Diagnostic test accuracy studies are inherently cross-sectional, aiming to assess a test’s performance in identifying patients with the condition of interest at the time the test was conducted. The fully paired study design described above employs a single set of inclusion and exclusion criteria for study participants that define the target population from which the group

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

TABLE 4-1 Reference Standards Used in Perimetry Validation Studies

Citation Index Test under Evaluation Reference Standard
Ahmed et al., 2022 Toronto Portable Perimeter (virtual reality [VR] headset using smartphone) HFA, SITA Standard 24-2, size III stimulus
Balasubramanian et al., 2023 (conference abstract) LED perimeter Octopus 900, unknown algorithm, size V stimulus
Bentley et al., 2012 Useful Field of View HFA, SITA Standard 24-2, size III stimulus; HFA binocular Esterman program (functional test evaluating fitness to drive)
Bradley et al., 2024 Radius Virtual Reality Perimeter HFA, SITA Standard 24-2, size III stimulus
Chen et al., 2022 LUXIE (head-mounted VR with eye tracking) HFA, SITA Standard 30-2, size III stimulus
Chen et al., 2024 Perimouse (computer-based, website) HFA, SITA Standard 24-2, size III stimulus
Chia et al., 2019 Melbourne Rapid Fields (tablet based) HFA or Octopus 600, SITA or Octopus Standard 24-2, size III stimulus
Chia et al., 2021 Melbourne Rapid Fields (tablet based) HFA or Octopus 600, SITA or Octopus Standard 24-2, size III stimulus
Crossland et al., 2011 Adapted microperimeter Modified HFA, matrix mapping algorithm, size III stimulus
Cui et al., 2019 Heidelberg Edge Perimeter Octopus 900, G-TOP (tendency-oriented perimetry) 30-2, size III stimulus
Heinzman et al., 2022 VR headset Octopus 900, Zippy Estimation by Sequential Testing (ZEST), size V stimulus
Heinzman et al., 2023 Iowa Head-Mounted Display Open-Source Perimeter Octopus 900, ZEST, size V stimulus
Ichhpujani et al., 2021 Visual Fields Easy (tablet based) HFA, SITA Fast 24-2, size V stimulus
Johnson et al., 2017 Visual Fields Easy (tablet based) HFA, SITA Standard 24-2, size V stimulus
Jones, 2020 Eye catcher (open-source eye-movement tracking perimeter) HFA, SITA Standard 24-2, size III stimulus
Khizer et al., 2022 Specvic (computer-based) HFA, SITA Standard 30-2, size III stimulus
Lam et al., 2017 Heidelberg Edge Perimeter Octopus 900, G-TOP 30-2, size III stimulus
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Citation Index Test under Evaluation Reference Standard
McLaughlin et al., 2023 Virtual Vision LCD VR visual field device HFA, SITA analog, size III stimulus
Mees et al., 2020 C3 Fields Analyzer (head-mounted VR) HFA, SITA Standard 24-2, 0.55 mm stimulus
Meyerov et al., 2023 Online Circular Contrast Perimetry (computer-based) HFA, SITA analog, size III stimulus
Munshi et al., 2022 VR headset HFA, SITA Standard 24-2, size III stimulus
Najdawi et al., 2023 Smart System VR Perimeter HFA, SITA Standard 24-2, size III stimulus
Narang et al., 2021 Advanced Vision Analyzer VR perimeter HFA, SITA Standard 24-2, size III stimulus
Olsen et al., 2017 Damato Multifixation Campimetry Online, an inexpensive online test HFA, SITA Fast 30-2, stimulus size not reported
Pradhan et al., 2021 GearVision (smartphone-based, head-mounted Perimeter) HFA, SITA Standard 24-2, size III stimulus
Schulz et al., 2018 Melbourne Rapid Fields (tablet based) HFA, SITA Standard 24-2, size III stimulus
Susanna et al., 2024 VisuALL (VR head-mounted visual perimetry device) HFA, SITA Fast 24-2, size III stimulus
Terracciano et al., 2023 Portable automatic kinetic perimeter based on a VR headset device HFA, algorithm and stimulus size not reported
Tsapakis et al., 2018 Home (computer)-based visual field test for glaucoma screening HFA, proprietary suprathreshold algorithm, size III stimulus
Tsiogka et al., 2024 TsiogkaSpaeth grid (portable test) HFA, SITA Standard 24-2, size III stimulus
Vingrys et al., 2016 Melbourne Rapid Fields (tablet based) HFA, SITA Standard 24-2, size V stimulus
Wijayagunaratne et al., 2023 (conference abstract) Iowa Head Mounted Display perimeter Octopus 900, standard algorithm, stimulus size not reported
Wroblewski et al., 2014 VirtualEye HFA, SITA Standard or SITA Fast 24-2, size III stimulus

NOTES: The studies in the table were found in the committee’s literature review of perimetry validation studies from 2002 to the present. Only studies with a reference standard were included in this table. Participants in the studies in the table were either glaucoma patients at various stages of the condition only, or glaucoma patients compared with controls. To identify perimetry validation studies, a search was conducted in Medline, Scopus, and Embase for (“visual field test*” OR perimetry) AND validat* in the title, abstract or keywords for articles published 2002 to the present. The search yielded 832 results after deduplicating, and 216 results after initial screening. These studies were then reviewed manually to identify those that aimed to validate a new perimeter and/or compare different perimeters, excluding those that focused on analysis of data from a single existing perimeter. HFA = Humphrey Field Analyzer; SITA = Swedish Interactive Thresholding Algorithm.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

of participants is to be drawn. Another valid design for comparing two diagnostic tests is a randomized trial, where a group of eligible participants is assigned randomly to one of two diagnostic tests. The accuracy of the tests is then compared to evaluate their performance.

Diagnostic test accuracy studies sometimes include healthy controls or patients with a preexisting diagnosis without requiring all participants to undergo a reference test. These studies, known as “two-gate studies,” use two or more sets of eligibility criteria, meaning both enrollment and later statistical comparison are based on disease status. This design was previously referred to as “diagnostic case-control” design.

The bias associated with the two-gate design primarily stems from the nonrandom selection of participants. Empirical evidence indicates that accuracy is overestimated in two-gate studies (Lijmer et al., 1999; Rutjes et al., 2006). This is so because as an alternative to enrolling a full spectrum of patients, use of the index test among the group with a preexisting diagnosis makes cases easier to detect, which leads to higher estimates of sensitivity. Likewise, the inclusion of healthy controls is likely to lower the occurrence of false-positive results, thereby increasing specificity. (Sensitivity and specificity are defined below.) In summary, when participant enrollment in a test accuracy study is conditional on disease status, the disease severity is likely to be at the extremes of the spectrum, raising similar concerns about spectrum bias.

To determine how much evidence or what type of evidence is sufficient to validate a new test, it is necessary to consider the quality of that evidence, often referred to as “risk of bias.” QUADAS-2 is one widely accepted tool for assessing risk of bias in diagnostic test accuracy studies. Important contributors to the risk of bias include patient selection (see the above discussion of spectrum bias); potential differential verification if not all participants undergo both the index test and the reference test in the same way or at the time of the study; and whether the index test and reference test were conducted according to a predefined protocol, with results interpreted independently from each other (Whiting et al., 2011).

Finally, as noted earlier, it is important for diagnostic test accuracy studies to be conducted in a setting that accords with the intended setting of use. For example, if a test is assessed in a research setting with optimal test conditions and supervised by an expert technician, the results of the study may not apply to conduct of the test in the office of a primary care physician.

PERFORMANCE CONSIDERATIONS FOR EVALUATING NEW PERIMETRY TECHNIQUES

For a perimetry technique to assess disability effectively and accurately, its results must be valid, reliable, reproducible, and applicable for the specific

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

task at hand. The critical question is whether the results are satisfactory according to four key considerations, based on the totality of evidence:

  1. Validity refers to the ability of a perimetry technique to accurately identify whether an individual meets SSA’s criteria for receiving disability benefits. Sensitivity and specificity are important metrics for assessing validity. Sensitivity measures the test’s ability to correctly identify those with disability, while specificity measures its capacity to correctly identify those without it. Both sensitivity and specificity can be estimated only in a population that properly represents the target population (thus avoiding spectrum bias). Both sensitivity and specificity need to be measured with sufficient precision to permit confident evaluation against the SSA criteria, and this precision should be reported or at least be able to be calculated based on the reported information.
  2. Reliability indices, as used in perimetry and in this report, are intended to indicate the confidence with which one can ascertain whether the results of a single test are credible or they require repetition or rejection. Perimetry reliability indices include the proportions of false-positive responses, false-negative responses, and fixation losses (defined later).
  3. Reproducibility emphasizes the consistency of results across multiple tests, ensuring satisfactory test–retest consistency. Reproducibility is sometimes referred to as reliability in other diagnostic testing contexts, which can lead to confusion. In the context of this report, the committee distinguishes reproducibility as the degree to which results remain consistent (or less variable) over repeated measurements.
  4. Applicability refers to the extent to which the results of a perimetry technique can be generalized to the target population and the setting for which the test is intended—in other words, whether the study’s findings are relevant and applicable to the specific clinical context in which the test is meant to be used.

These four considerations can be addressed within a single study or across separate studies. Ideally for SSA’s purposes, a perimetry technique would be evaluated in a way that aligns with its intended use and target population specifically in the context of determining whether an individual meets the criteria for visual field loss in connection with SSA disability evaluation (applicability). However, the committee’s review of the literature revealed no studies that directly examined a perimetry technique for this specific purpose. In the absence of such studies, correlation with a reference standard perimeter in eyes with moderate and/or severe functional loss can

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

be used as a proxy, as those types of studies are available in the literature (see Table 3-1).

Most studies in the literature focus explicitly on patients with glaucoma, as this is the most common population tested clinically using perimetry. Many of these studies examine the ability to classify eyes as healthy versus glaucomatous, which is of limited relevance to the ability to perform disability evaluation, especially given that reproducibility often declines as functional loss increases with advancing glaucoma (Artes et al., 2002).

Studies of perimetry for people with glaucoma may not fully reflect the pathology or visual experience of people with other ophthalmic disorders, such as those of the outer retina or cortex. The mean deviation or visual field may be similar in two individuals with each respective disease, but their eyesight as a whole may be very different. However, this report is primarily concerned with the evaluation of visual fields against the SSA criteria; other aspects of vision can be evaluated through other parts of the SSA disability determination process (see Chapter 1). In this context, the committee is comfortable extrapolating findings of visual field loss across diseases.

Validity

In general, validity refers to the ability of a test or protocol to actually capture what it intends to capture. As stated, the relevant definition of validity in this report is the ability of a perimetry technique to accurately identify whether an individual meets SSA’s criteria for receiving disability benefits.

Sensitivity and specificity are important metrics for assessing validity. Since the key question examined in this report is how to feasibly determine whether an individual with visual field loss meets SSA’s criteria for visual disability, permissible methods must necessarily have acceptable sensitivity and specificity. Simply assessing whether a test can classify eyes as “normal” or “abnormal” is inadequate. Instead, it is crucial to evaluate the test’s ability to differentiate between “moderate” and “severe” visual field loss. While this distinction does not directly capture SSA’s specific criteria for disability evaluation, eligibility depends on how severely a person’s visual field is constricted (see Chapter 1). As a result, the most relevant studies will evaluate a perimetry test’s ability to discern between levels of impairment. Additionally, considerations of validity encompass all domains related to risk of bias in diagnostic test accuracy studies, as outlined in the previous section and throughout this chapter.

Another relevant consideration is the measurement precision reported by the instrument, which affects the numerical range of outcomes that can possibly be reported. Typically, perimeters round visual field data to the nearest integer.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

Reliability Indices

Test reliability in perimetry has traditionally been assessed using the three indices of false positives, false negatives, and fixation losses. While a new perimetry technique need not adhere strictly to this framework, not least because those indices are now not seen as particularly good measures of reliability, it is reasonable to consider whether equivalent information is available in a studied technique. Notably, poor reliability indices can hinder the detection and monitoring of small visual field defects; however, the committee felt that these indices may have less impact on mean deviation and assessment of current standards for visual disability. Mean deviation quantifies the degree to which a patient’s visual sensitivity across the entire field differs from what is expected in a healthy individual of the same age.

False positives occur when an individual undergoing perimetry testing indicates they have seen a stimulus when none has been presented. This possibility can be assessed through catch trials, in which no stimulus is shown during an interval when one might be expected. If the examinee indicates seeing a stimulus during such an interval, this is considered a false-positive response. Additionally, false positives can be identified by recording responses that occur outside the physiologically plausible period of time after presentation of a stimulus. A high false-positive rate can be an indicator of “trigger-happy” behavior on the part of the patient, which may in turn lead to test results that do not accurately reflect the extent of visual field loss. Clinical guidelines often recommend that a test be considered unreliable if false-positive responses exceed 15 percent, though some test results with higher false-positive rates can still be useful (Heijl et al., 2021). This is because the impact of false positives on mean sensitivity is relatively minor; a 10-percentage-point increase in the false-positive rate has been estimated to raise mean deviation by 0.3–0.4 decibels (dB) in eyes with early-stage glaucoma, and by up to 1.4 dB in cases of severe glaucoma (Heijl et al., 2022).

False negatives occur when an individual being tested fails to respond to a stimulus that, based on the test as a whole, they should be able to see. In other words, a false negative is when an examinee fails to respond to a stimulus that is significantly higher in intensity than the patient’s calculated threshold at that location. In damaged areas of the visual field, the frequency-of-seeing curve is very shallow (Gardiner et al., 2014). For example, a stimulus 6 dB more intense than the detection threshold2 may still go unnoticed, even by a perfectly reliable observer (Gardiner et al., 2014). As a result, the frequency of false-negative responses tends to reflect the status of the disease more than the reliability of the test itself (Bengtsson and Heijl, 2000).

___________________

2 As discussed in Chapter 3, the standard unit for measuring the visual field is differential light sensitivity, which defines the threshold for detecting a test object relative to its background.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

While the committee felt that the false-negative index remains useful in certain circumstances, a high false-negative rate does not necessarily mean that a test should be discounted entirely. For example, a visual field with greater than 33 percent false negatives may be acceptable if other clinical information supports the presence of severe loss of vision.

Fixation refers to the process of maintaining the gaze on a specific point or target in the visual field. In the context of visual field testing, it usually involves keeping the eyes steady on a designated fixation point while stimuli are presented in the surrounding areas. Traditionally, fixation losses have been evaluated using catch trials. At the beginning of the test, the perimeter maps the position of the blind spot. Stimuli are then presented in that region; if the examinee responds, it is recorded as a fixation loss. A test is typically considered unreliable if an individual responds to more than 20 percent of these catch trials (Heijl et al., 2012). However, this assessment can be inaccurate if the blind spot is incorrectly mapped at the outset. Assessment also becomes challenging during binocular visual field testing with both eyes open unless the stimulus can be directed to just one eye—possible with a virtual reality headset but not with tablets or personal computer screens. Moreover, tests in which the fixation target is not central can lead to greater loss of fixation, reducing the reliability of the test results. For example, when the fixation target is presented in the corner of the screen during perimetry testing using a tablet to increase the testable area, the location where the stimulus will appear relative to the target point is within a much narrower range of the visual field than when the testing uses a bowl perimeter device. Thus, the individual being tested may tend to move their gaze in the direction in which the stimulus is expected to appear.

An approach newer than catch trials involves using a built-in camera for fixation monitoring, which may offer equivalent or improved accuracy. A trace of fixation (a visual depiction of where an individual’s gaze was focused during the test) can be a useful tool for educating the patient, as well as a useful indicator of poor test reliability. However, the fixation trace must be quantified before an appropriate threshold can be established. Gaze-tracking measurements do not always correlate well with the probability of fixation losses (Camp et al., 2022), and it remains unclear which method is superior for assessing test reliability. (See Figure 3-6 [j] in Chapter 3 for an example of how variability in gaze during a perimetry test is reported.) Furthermore, head position tracking does not provide equivalent information, as it may not detect fixation movements.

Poor fixation can result in missed localized defects, as well as inaccuracies in measuring their spatial extent or depth. However, the committee notes that the impact on global averages, such as mean deviation, is smaller; excessive fixation losses have been shown to have only a minor effect on mean deviation (Yohannan et al., 2017). Poor fixation could

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

cause the patient to observe and respond to stimuli that they would not otherwise have seen, reducing the apparent severity of the defect; this is a critical problem when, for example, assessing the ability to drive. However, the committee agreed that it would be very unlikely to cause an overestimate of defect severity to a degree that would impede accurate disability assessment.

Given these caveats regarding reliability indices, it is not always appropriate to treat them as fixed binary cutoffs. Instead, it is best to examine a test result that appears to have poor reliability to determine why poor reliability indices were measured. For example, a high false-negative rate might be caused by low visual sensitivity. Given such an explanation, the test result may still provide sufficient information for the assessment of disability. At the same time, the committee was unable to find literature supporting specific alternatives that could be used as guidelines or cutoffs instead of reliability indices. As a result, the committee instead notes that the results of a perimetry test need to be analyzed holistically. In other words, evaluating a new perimetry technique requires more than simply assessing the result of its validation studies. It is also important to examine the metrics those validation studies use to judge a technique reliable or unreliable.

It is therefore advisable to collect and report reliability indices, as these are an essential piece of information to be included in that assessment.

Reproducibility

Interpretation of test reproducibility is best carried out in the context of the severity of visual field loss. The most commonly used static automated threshold perimetry test in the United States employs a 24-2 testing pattern using the SITA Standard testing algorithm, modulating stimulus contrast with a Goldmann size III stimulus, on a Humphrey Field Analyzer perimeter. When a location appears healthy (sensitivity 30–35 dB), the measured sensitivity upon retesting that same location typically varies with a standard deviation of ±1 dB (Artes et al., 2002). Test–retest variability increases with greater visual field loss. For locations with sensitivity of 10–15 dB (around the mean deviation cutoff of 22 dB or greater for visual disability by current SSA standards), the standard deviation for test–retest variability rises to ±6 dB (Artes et al., 2002). For intensities brighter than 15–20 dB, the probability of responding to the stimulus plateaus, meaning further increases in brightness have minimal impact on detectability, which hinders accurate measurement of severe visual field loss (Gardiner et al., 2014).

Variability in results may also arise from different levels of contrast. Research has shown that testing with contrasts greater than 15–20 dB on a

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

Humphrey Field Analyzer does not enhance the ability to detect progression and can be excluded without loss of information (Gardiner et al., 2016; Wall et al., 2018). Although SSA’s current standards for visual disability indicate the ability to detect a stimulus corresponding to 10-dB contrast on a Humphrey Field Analyzer, it is likely that equivalent results can be achieved using a stimulus corresponding to a 15-dB contrast.

Alternative perimetry techniques can yield varying metrics that influence reproducibility. For example, using a larger stimulus size generally increases mean sensitivities and reduces variability. Additionally, different types of stimuli may exhibit distinct variability profiles. Testing algorithms may adjust stimulus size or other characteristics in addition to, or instead of, modifying intensity, and results may be expressed in units other than dB.

A benchmark for reproducibility is whether test results from the perimetry technique being evaluated are as reproducible as those from the current clinical standard. In this context, the relevant metric is the probability that a patient would continue to meet SSA’s criteria for visual disability if they met those criteria the previous day, regardless of whether the same or a different test was used, and whether this probability is similar to that of the current benchmark. If necessary, disability criteria or guides to their interpretations could be modified to reflect the equivalent criteria from current clinical standard testing. When assessing a new perimetry technique, a primary consideration is comparing the intertechnique variability (i.e., between two tests conducted with different perimeters) against the intratechnique variability (i.e., between two tests conducted with the current clinical standard perimeter), with a sufficiently short time between same- or different-technique tests that there will have been no significant change in the patient’s true level of visual function.

Applicability

Applicability considerations include such factors as the age, linguistic ability, and socioeconomic status of the study population, ensuring that they reflect those of the target population (see the earlier discussion of spectrum bias). For example, in the context of using perimetry to assess visual field defects for disability evaluation, the study population should represent the demographic characteristics of individuals who are undergoing SSA disability assessments. Based on this, all studies in Table 4-1 would be rated as having low applicability to SSA disability assessments. Additionally, it is important that the perimetry test used in the study closely mirrors the one that would be employed in real-world clinical settings. Differences in the test’s procedure, technology, or interpretation—such as variations in testing protocols or the equipment used—could influence the generalizability of the study’s findings to the target population.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

QUANTITY AND CHARACTERISTICS OF VALIDATION STUDIES NEEDED TO FIND A PERIMETER ACCEPTABLE

The committee’s statement of task asks whether three published clinical validation studies are needed to find a perimeter acceptable, and if fewer studies are acceptable, what the requirements of the study design would be.

If the instrument or software is expected to be used in diverse populations or clinical settings, having more studies can help demonstrate its validity, reliability, and applicability across different conditions. More studies can provide a larger sample size, which may improve the statistical power and robustness of the findings. However, the committee’s judgment is that the quality, relevance, and totality of the evidence are more important than the number of studies available. Therefore, the committee concluded that it is not possible to quantify the number of studies required to find a perimetry technique acceptable, and accordingly, this chapter does not stipulate a required number of published studies. Instead, it highlights essential factors to be considered when considering whether a study evaluating a new perimetry technique is well designed.3 Such studies need to address clinically relevant endpoints and demonstrate how the new technique compares with existing standards. In the case of vision-related disability, it will be important to ensure that the combination of the hardware and software measures the boundaries of the visual field requirements outlined by SSA.

SUMMARY AND CONCLUSIONS

Validating a new perimetry technique requires a thorough assessment of its validity, reliability indices, reproducibility, and applicability, all aligned with its intended use in SSA disability evaluation. The ideal study assessing a new perimetry technique is designed in a way that directly evaluates its intended use and target population—for this report, specifically to determine whether an individual meets the criteria for visual field loss in connection with SSA disability evaluation. Given that the committee’s review of the literature revealed no studies that directly examined a perimetry technique for this specific purpose, one could use correlation with a reference standard perimeter in eyes with moderate and/or severe functional loss as a proxy. As reference standards are typically the current “gold standard,” comparing new perimeters with the Humphrey Field Analyzer using a size III stimulus may generate useful evidence.

___________________

3 Regulatory bodies such as the Food and Drug Administration (FDA) in the United States may have specific guidelines regarding the number and type of studies required for validation. In addition, FDA and other federal agencies have encouraged a patient-centered outcomes approach for assessing the functionality and effectiveness of devices.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

Determination of the acceptability of a new perimetry technique needs to focus on the specific combination of hardware, stimuli, testing patterns, and algorithms, not solely on the device itself. Also essential is to take the scope of evidence into account, including data from diverse populations and real-world settings, to ensure that the technique performs effectively across patients with various underlying clinical conditions. Two well-designed, adequately powered studies may provide more reliable evidence than three poorly designed ones.

To determine how much or what type of evidence is sufficient, one must also consider the quality of the information, often referred to as risk of bias. QUADAS-2 is one widely accepted tool for assessing risk of bias in diagnostic test accuracy studies. Important contributors to risk of bias include potential differential verification; patient selection; and whether the index test and reference test were conducted according to a predefined protocol, with results interpreted independently.

Based on its review of the literature and the committee’s expert assessment, the committee reached the following conclusions:

Conclusion 4.1: When assessing the acceptability of a technique for visual field assessment, the quality, relevance, and totality of the evidence are more important than the number of published studies available.

Conclusion 4.2: Sensitivity (in the sense of a test’s ability to identify correctly those with a qualifying disability) and specificity (a test’s capacity to identify correctly those without a qualifying disability) are important metrics for assessing a test’s internal validity. Both specificity and sensitivity need to be measured with sufficient precision to permit confident evaluation against the SSA criteria.

Based on its review of the literature, the committee reached the following conclusion:

Conclusion 4.3: Test results that appear to have poor reliability indices need to be examined to determine whether the results may still be useful for identifying deficits that qualify for disability benefits by providing sufficient information for the determination of disability.

With intensities above 15–20 dB on a Humphrey Field Analyzer, or the equivalent on other instruments, the probability that an examinee will respond to the stimulus plateaus. Therefore, further increases in brightness have minimal impact on detectability, which hinders accurate measurement of severe field loss. Variability in results may also arise from different levels of contrast. Research has shown that testing with contrasts greater than 15–20 dB does not enhance the ability to detect progression of visual field impairment and can be excluded without loss of information.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

Based on its review of the literature and the committee’s expert assessment, the committee reached the following conclusion:

Conclusion 4.4: Although SSA’s current criteria for visual disability require the ability to detect a stimulus corresponding to 10-dB contrast on a Humphrey Field Analyzer perimeter, it is likely that equivalent results can be achieved using a stimulus corresponding to a 15-dB contrast.

REFERENCES

Ahmed, Y., A. Pereira, S. Bowden, R. B. Shi, Y. Li, I. I. K. Ahmed, and S. A. Arshinoff. 2022. Multicenter comparison of the Toronto portable perimeter with the Humphrey Field Analyzer: A pilot study. Ophthalmology Glaucoma 5(2):146–159.

Artes, P. H., A. Iwase, Y. Ohno, Y. Kitazawa, and B. C. Chauhan. 2002. Properties of perimetric threshold estimates from full threshold, SITA standard, and SITA fast strategies. Investigative Ophthalmology & Visual Science 43(8):2654–2659.

Balasubramanian, G., J. C. Park, R. A. Hyde, and J. J. McAnany. 2023. Validation of a novel LED-based chromatic visual field perimeter. Investigative Ophthalmology & Visual Science 64(8):5344.

Bengtsson, B., and A. Heijl. 2000. False-negative responses in glaucoma perimetry: Indicators of patient performance or test reliability? Investigative Ophthalmology & Visual Science 41(8):2201–2204.

Bentley, S. A., R. P. LeBlanc, M. T. Nicolela, and B. C. Chauhan. 2012. Validity, reliability, and repeatability of the useful field of view test in persons with normal vision and patients with glaucoma. Investigative Ophthalmology & Visual Science 53(11):6763–6769.

Bradley, C., I. I. K. Ahmed, T. W. Samuelson, M. Chaglasian, H. Barnebey, N. Radcliffe, and J. Bacharach. 2024. Validation of a wearable virtual reality perimeter for glaucoma staging, the NOVA trial: Novel virtual reality field assessment. Translational Vision & Science Technology 13(3):10.

Camp, A. S., C. P. Long, V. M. Patella, J. A. Proudfoot, and R. N. Weinreb. 2022. Standard reliability and gaze tracking metrics in glaucoma and glaucoma suspects. American Journal of Ophthalmology 234:91–98.

Chen, Y. T., P. H. Yeh, Y. C. Cheng, W. W. Su, Y. S. Hwang, H. S. L. Chen, Y. S. Lee, and S. C. Shen. 2022. Application and validation of LUXIE: A newly developed virtual reality perimetry software. Journal of Personalized Medicine 12(10).

Chen, Z., X. Shen, Y. Zhang, W. Yang, J. Ye, Z. Ouyang, G. Zheng, Y. Yang, and M. Yu. 2024. Development and validation of an internet-based remote perimeter (Perimouse). Translational Vision Science & Technology 13(3):16.

Chia, M., A. Turner, G. Kong, A. Agar, and E. Trang. 2019. Validation of an iPad visual field test to screen for glaucoma in rural and remote settings. Clinical and Experimental Ophthalmology 47(Supplement 1):121.

Chia, M. A., E. Trang, A. Agar, A. J. Vingrys, J. Hepschke, G. Y. X. Kong, and A. W. Turner. 2021. Screening for glaucomatous visual field defects in rural Australia with an iPad. Journal of Current Glaucoma Practice 15(3):125–131.

Crossland, M. D., V. A. Luong, G. S. Rubin, and F. W. Fitzke. 2011. Retinal specific measurement of dark-adapted visual function: Validation of a modified microperimeter. BMC Ophthalmology 11:5.

Cui, Q. N., P. Gogt, J. M. Lam, S. Siraj, L. A. Hark, J. S. Myers, L. J. Katz, and M. Waisbourd. 2019. Validation and reproducibility of the Heidelberg edge perimeter in the detection of glaucomatous visual field defects. International Journal of Ophthalmology 12(4):577–581.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

Deeks, J. J., P. M. Bossuyt, M. M. Leeflang, and Y. Takwoingi (Eds). 2023. Cochrane handbook for systematic reviews of diagnostic test accuracy, 1st ed. Chichester (UK): Wiley.

Gardiner, S. K., W. H. Swanson, and S. Demirel. 2016. The effect of limiting the range of perimetric sensitivities on pointwise assessment of visual field progression in glaucoma. Investigative Ophthalmology & Visual Science 57(1):288–294.

Gardiner, S. K., W. H. Swanson, D. Goren, S. L. Mansberger, and S. Demirel. 2014. Assessment of the reliability of standard automated perimetry in regions of glaucomatous damage. Ophthalmology 121(7):1359–1369.

Heijl, A., V. M. Patella, and B. Bengtsson. 2012. The field analyzer primer: Effective perimetry. Dublin, CA: Carl Zeiss Meditec, Inc.

Heijl, A., V. M. Patella, and B. Bengtsson. 2021. The field analyzer primer: Excellent perimetry, 5th edition. Carl Zeiss Meditec, Inc.

Heijl, A., V. M. Patella, J. G. Flanagan, A. Iwase, C. K. Leung, A. Tuulonen, G. C. Lee, T. Callan, and B. Bengtsson. 2022. False positive responses in standard automated perimetry. American Journal of Ophthalmology 233:180–188.

Heinzman, Z., K. Alawa, I. Marin-Franch, A. Turpin, and M. Wall. 2022. Validation of visual field results of a new open-source virtual reality headset. Investigative Ophthalmology & Visual Science 63(7):1259–A0399.

Heinzman, Z., E. Linton, I. Marín-Franch, A. Turpin, K. Alawa, A. Wijayagunaratne, and M. Wall. 2023. Validation of the Iowa head-mounted open-source perimeter. Translational Vision Science & Technology 12(9):19.

Ichhpujani, P., S. Thakur, R. K. Sahi, and S. Kumar. 2021. Validating tablet perimetry against standard Humphrey visual field analyzer for glaucoma screening in Indian population. Indian Journal of Ophthalmology 69(1):87–91.

Johnson, C. A., S. Thapa, Y. X. George Kong, and A. L. Robin. 2017. Performance of an iPad application to detect moderate and advanced visual field loss in Nepal. American Journal of Ophthalmology 182:147–154.

Jones, P. R. 2020. An open-source static threshold perimetry test using remote eye-tracking (eyecatcher): Description, validation, and preliminary normative data. Translational Vision Science & Technology 9(8):18.

Khizer, M. A., T. A. Khan, U. Ijaz, S. Khan, A. K. Rehmatullah, I. Zahid, H. G. Shah, M. A. Zahid, H. Sarfaraz, and N. Khurshid. 2022. Personal computer-based visual field testing as an alternative to standard automated perimetry. Cureus 14(12):e32094.

Lam, J. M., L. A. Hark, J. S. Myers, L. J. Katz, S. Siraj, M. Waisbourd, P. Gogte, and Q. J. Cui. 2017. Validation and reproducibility of the Heidelberg edge perimeter in the detection of visual field defects in glaucoma participants. Investigative Ophthalmology & Visual Science 58(8).

Lijmer, J. G., B. W. Mol, S. Heisterkamp, G. J. Bonsel, M. H. Prins, J. H. van der Meulen, and P. M. Bossuyt. 1999. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282(11):1061–1066. https://doi.org/10.1001/jama.282.11.1061

McLaughlin, M., N. E. M. Sanal-Hayes, L. D. Hayes, E. C. Berry, and N. F. Sculthorpe. 2023. People with long COVID and myalgic encephalomyelitis/chronic fatigue syndrome exhibit similarly impaired vascular function. American Journal of Medicine 138(3):560–566.

Mees, L., S. Upadhyaya, P. Kumar, S. Kotawala, S. Haran, S. Rajasekar, D. S. Friedman, and R. Venkatesh. 2020. Validation of a head-mounted virtual reality visual field screening device. Journal of Glaucoma 29(2):86–91.

Meyerov, J., Y. Deng, L. Busija, D. Bigirimana, and S. E. Skalicky. 2023. Online circular contrast perimetry: A comparison to standard automated perimetry. Asia-Pacific Journal of Ophthalmology12(1):4–15.

Munshi, H., K. Da Silva, E. Savatovsky, E. Bitrian, A. L. Grajewski, and T. C. Chang. 2022. Preliminary retrospective validation of a novel virtual reality visual field standard testing algorithm, as compared to standard automated perimetry. Investigative Ophthalmology & Visual Science 63(7):1275–A0415.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.

Najdawi, W., C. A. Johnson, and A. Pouw. 2023. Validation of a novel head-mounted perimeter versus the Humphrey Field Analyzer. Investigative Ophthalmology & Visual Science 64(8):5496.

Narang, P., A. Agarwal, M. Srinivasan, and A. Agarwal. 2021. Advanced vision analyzer-virtual reality perimeter: Device validation, functional correlation and comparison with Humphrey Field Analyzer. Ophthalmology Science 1(2):100035.

Olsen, A. S., A. T. Steensberg, M. la Cour, T. W. Kjaer, B. Damato, L. H. Pinborg, and M. Kolko. 2017. Can DMCO detect visual field loss in neurological patients? A secondary validation study. Ophthalmic Research 58(2):85–93.

Pradhan, Z. S., T. Sircar, H. Agrawal, H. L. Rao, A. Bopardikar, S. Devi, and V. N. Tiwari. 2021. Comparison of the performance of a novel, smartphone-based, head-mounted perimeter (Gearvision) with the Humphrey Field Analyzer. Journal of Glaucoma 30(4):E146–E152.

Rutjes, A. W., J. B. Reitsma, M. Di Nisio, N. Smidt, J. C. van Rijn, and P. M. Bossuyt. 2006. Evidence of bias and variation in diagnostic accuracy studies. Canadian Medical Association 174(4):469–476.

Schulz, A. M., E. C. Graham, Y. You, A. Klistorner, and S. L. Graham. 2018. Performance of iPad-based threshold perimetry in glaucoma and controls. Clinical & Experimental Ophthalmology 46(4):346–355.

Susanna, F. N., C. N. Susanna, P. G. Salomão Libânio, F. T. Nishikawa, R. A. Schiave Germano, and R. S. Junior. 2024. Comparison between the fast strategies of a virtual reality perimetry and the Humphrey Field Analyzer in patients with glaucoma. Ophthalmology Glaucoma, S2589–4196(24):00219-9.

Terracciano, R., A. Mascolo, L. Venturo, F. Guidi, M. Vaira, C. M. Eandi, and D. Demarchi. 2023. Kinetic perimetry on virtual reality headset. IEEE Transactions on Biomedical Circuits and Systems 17(3):413–419.

Tsapakis, S., D. Papaconstantinou, A. Diagourtas, S. Kandarakis, K. Droutsas, K. Andreanos, and D. Brouzas. 2018. Home-based visual field test for glaucoma screening comparison with Humphrey perimeter. Clinical Ophthalmology 12:2597–2606.

Tsiogka, A., M. L. Moster, K. I. Chatzistefanou, E. Karmiris, E. Samoli, I. Giachos, K. Droutsas, D. Papaconstantinou, and G. L. Spaeth. 2024. The TsiogkaSpaeth grid for detection of neurological visual field defects: a validation study. Neurological Sciences 45(6):2869–2875.

Vingrys, A. J., J. K. Healey, S. Liew, V. Saharinen, M. Tran, W. Wu, and G. Y. Kong. 2016. Validation of a tablet as a tangent perimeter. Translational Vision Science & Technology 5(4):3.

Wall, M., G. K. D. Zamba, and P. H. Artes. 2018. The effective dynamic ranges for glaucomatous visual field progression with standard automated perimetry and stimulus sizes III and V. Investigative Ophthalmology & Visual Science 59(1):439–445.

Whiting, P. F., A. W. Rutjes, M. E. Westwood, S. Mallett, J. J. Deeks, J. B. Reitsma, M. M. Leeflang, J. A. Sterne, and P. M. Bossuyt. 2011. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine 155(8):529–536.

Wijayagunaratne, A., E. Linton, I. M. Franch, K. Alawa, and M. Wall. 2023. Smartphone perimetry: Comparison to standard automated perimetry and assessment of size-modulation strategy with frequency-of-seeing curves in healthy subjects. Investigative Ophthalmology & Visual Science 64(8):1502.

Wroblewski, D., B. A. Francis, A. Sadun, G. Vakili, and V. Chopra. 2014. Testing of visual field with virtual reality goggles in manual and visual grasp modes. BioMed Research International 2014:206082.

Yohannan, J., J. Wang, J. Brown, B. C. Chauhan, M. V. Boland, D. S. Friedman, and P. Y. Ramulu. 2017. Evidence-based criteria for assessment of visual field reliability. Ophthalmology 124(11):1612–1620.

Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 83
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 84
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 85
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 86
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 87
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 88
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 89
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 90
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 91
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 92
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 93
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 94
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 95
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 96
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 97
Suggested Citation: "4 Evaluating New Perimetry Techniques." National Academies of Sciences, Engineering, and Medicine. 2025. Visual Field Assessment and Disability Evaluation. Washington, DC: The National Academies Press. doi: 10.17226/29124.
Page 98
Next Chapter: 5 Special Topics
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.