Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief (2025)

Chapter: Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

Convened November 22, 2024

Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness
Proceedings of a Workshop—in Brief


The National Academies of Sciences, Engineering, and Medicine was asked to appoint a committee to conduct an Independent Analysis of the Department of Defense’s (DoD’s) Comprehensive Autism Care Demonstration (ACD). As part of this effort, the committee was tasked to address nine identified areas, which were later amended, in Section 737 of Public Law 117-81, National Defense Authorization Act for Fiscal Year 2022. These tasks included “An assessment of all methods used to assist in the assessment of domains related to autism spectrum disorder broadly, including a determination as to whether the Secretary is applying such methods appropriately under the demonstration project;” and “an assessment of the methods used under the demonstration project to measure the effectiveness” of applied behavior analysis (a prominent intervention for those diagnosed with autism spectrum disorder [ASD]).1

The ACD, authorized through 2028, provides reimbursement for applied behavior analysis (ABA) to TRICARE-eligible beneficiaries diagnosed with ASD. TRICARE is a health benefit program with approximately 9.6 million beneficiaries, including active-duty personnel, reserve component personnel, military retirees, and their families, administered by the Defense Health Agency at DoD. A public information session, “Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness,” was part of the information-gathering process for the committee. George W. Rutherford, University of California, San Francisco, Committee Chair, welcomed participants and explained that the session was designed to engage experts to better understand the assessment methods used under the ACD, the distinctions between the types of data and consent used for research and clinical care, and the intersections with program evaluation and quality improvement efforts. Specifically, speakers considered differences in purpose, methods, and goals of different forms of assessment, and examined considerations for determining risks, burdens, and benefits for patients and caregivers and suggested possible best practices for informed consent for each kind of endeavor.

OVERVIEW OF OUTCOME MEASURES USED UNDER THE ACD

Thomas W. Frazier, John Carroll University, provided a brief overview of outcome measures in the ACD assessment battery. He described the strengths and weaknesses of each measure, although he emphasized that no assessment instrument is perfect. “All assessment involves trade-offs,” Frazier said, so the question is how to optimize battery creation for the TRICARE population, given

___________________

1 For more information on the nine areas of interest for the committee’s independent analysis, see: https://www.nationalacademies.org/our-work/independent-analysis-of-department-of-defenses-comprehensive-autism-care-demonstration-program

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

the current state of knowledge and available instruments. The instruments currently used by TRICARE were chosen based on what was known at the time, he said, but progress has been made since their adoption.

Current Measures

Frazier described the strengths and weaknesses of five current ACD assessments: (a) Vineland Adaptive Behavior Scale Third Edition (Vineland-3), (b) Social Responsiveness Scale, Second Edition (SRS-2), (c) Pervasive Developmental Disorder Behavior Inventory (PDDBI), (d) the Parenting Stress Index, Fourth Edition, Short Form (PSI-4-SF), and (e) the Stress Index for Parents of Adolescents (SIPA). He noted that in recent years, researchers have developed guidelines on measure development, and while some of these measures were developed with attention to aspects of these guidelines and processes, he said, others were not.

Vineland-3

The Vineland-3 has a long history of clinical and research use. The psychometric characteristics of the Vineland-3 are strong in several ways, said Frazier. For example, scale reliability is very strong for scored scales, test-retest reproducibility is favorable, and there is very good evidence for convergent validity, discriminant validity, and sensitivity to change. He noted, however, the Vineland-3 is inadequate for treatment planning and monitoring for behavioral intervention within DoD/TRICARE, particularly in the areas of structural content and construct validity. The scores themselves are lacking for autism-relevant behavioral intervention progress monitoring, and there is no ability to compute reliable change and attribute—or severity—adjusted change scores. These characteristics are essential, Frazier said, to move toward a clinical quality improvement process and a value-based care evaluation. He noted that the Vineland-3 does have elements of standard automation that are consistent with legacy psychological assessment measures though it is lacking in intervention decision support, progress monitoring tools, or development of teaching plans. Another weakness of the Vineland-3, said Frazier, is the lack of social communication and interaction (SCI) subscales—which severely limits treatment planning and progress monitoring within a behavioral intervention context.

SRS-2

The SRS-2 has a long history of clinical and research use, has parent- and teacher-report aspects, includes assessment of SCI and restrictive and repetitive behaviors (RRB) symptoms, as well as measures across a wide range of autistic traits, said Frazier. The total score has excellent psychometric properties and has shown sensitivity to change. He noted, however, that it is susceptible to rater and placebo effects, as are many parent-report assessments, and it has very limited utility for intervention planning. The SRS-2 does not measure all ASD symptom subdomains; coverage of some types of core RRB symptoms is particularly lacking. Furthermore, not all measured items apply to all ASD cases; some require speech or imply higher levels of cognitive function. Another weakness of the SRS-2, Frazier pointed out, is limited potential to automate administration and databasing of data with other measures.

PDDBI

Like the SRS-2, the PDDBI was not built for ABA outcome assessment. In addition, Frazier said, the factor structure and scoring are not clear or well-understood, and there is mixed evidence of sensitivity to change in behavioral intervention studies. However, the PDDBI includes a wide range of core autism and associated symptoms and covers anxiety and aggression as well as receptive and expressive aspects of communication. The assessment has substantial prior use in ASD intervention studies, and the content has utility for intervention planning. He noted that the PDDBI was first published in 2003, when most published papers did not use modern measure development processes or comprehensively examine psychometric properties.

PSI-4/SIPA

The PSI-4 and the SIPA (the former being used for children 0–12 years, the latter for children 11–19 years) has a long history of clinical and research use, said Frazier, with prior studies in populations similar to DoD/TRICARE. There is a large amount of literature on the PSI-4/SIPA, although most comes from outside the autism context. The assessment has strong coverage of content areas related to parent feelings of competence and stress, including child characteristics and parent-child relationship, and there is strong evidence that the total score has reliability and construct validity. However, the assess-

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

ment is long (20- to 30-minute administration) and was not built for ABA outcome assessment. There are significant differences in content across forms, and the factor structure is not well replicated. There is no peer-reviewed evidence of sensitivity to change in behavioral intervention, said Frazier, and the assessment is not effectively being used to inform treatment planning or provide resources due to parent apprehension over the measure due to its length and the invasiveness of some of its questions.

Take Aways

The outcome measures in the ACD assessment battery have limitations that necessitate revision, prioritizing an optimal outcomes assessment approach, said Frazier. The assessments in the battery were not built for the context of the ACD, he said, and they were designed with limited input from relevant groups and mixed psychometric properties. The outcome measures in the ACD assessment battery do not adequately assess all the core autism sub-domains that have been identified, and there is limited coverage of co-occurring conditions that are relevant to child functioning. There is some coverage of parent stress-related constructs, but broader coverage of quality of life for the child, family, and ecosystem is lacking. Finally, he concluded, the current assessment battery has limited automation for progress monitoring, and clinical decision support is weak for intervention planning.

Frazier offered his thoughts on an ideal system for assessment under the ACD. First, he said that outcome assessment should be developed with the input of relevant groups who are directly affected by ASD care to identify the most relevant domains of symptoms and functioning. These groups can develop core and supplemental assessment batteries through a consensus process; the assessment batteries would utilize multi-modal assessments and include psychometric and practical criteria. Frazier emphasized the importance of multi-modal assessment. Different informants have different perspectives, he said, so gathering information from a variety of sources gives a broader perspective on the patient. Relevant groups would determine appropriate assessment schedules and procedures, which would depend on the intervention type, characteristics, and context, and would develop decision support tools for each assessment. A platform could be developed for administration, scoring, reporting, decision support, monitoring, and there would be required ongoing quality review and improvement. Finally, said Frazier, data from the system could be used to generate and disseminate intervention and cost effectiveness analyses. He said that this vision for the future of DoD/TRICARE would create a system in which assessment is conducted efficiently and at a low cost, with strong parental consent and participation, that would be less time-intensive, and with results feedback to the parent, strong intervention planning value, and appropriate supports to inform evidence-based clinical management decisions.

DATA COLLECTION ON PATIENTS: DETERMINING PURPOSE, VALUE, RISKS, AND BURDENS

This session explored ethical considerations involved in the collection and use of patient data in the contexts of clinical care, clinical research, monitoring program effectiveness, and quality improvement.

Ethical Considerations in Clinical Care, Clinical Research, and Quality Improvement

Holly A. Taylor, National Institutes of Health, spoke about the ethical considerations related to patient data collection in three areas: (a) clinical care, (b) clinical research, and (c) monitoring program effectiveness and quality improvement.

Clinical Care

The goal of clinical care, said Taylor, is to act in the best interest of the patient. In the context of ASD, this would include a diagnosis of ASD and a plan of care with evidence-based interventions and assessments that can plot progress. To use clinical care data for research purposes, relevant variables can be abstracted from a clinical chart to allow comparison within and across cases. If the data are not publicly available, said Taylor, the researcher must get approval from an Institutional Review Board (IRB). A researcher may seek a waiver of informed consent if they seek to use retrospective data; however, the use of prospective clinical data for research purposes requires patient consent or parental permission.

Clinical Research

Investigators conducting clinical research must protect the welfare of research participants, said Taylor. While an individual may or may not benefit directly from their participation in research, future patients and society at

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

large will benefit regardless of the outcome. She shared a framework for biomedical research in which ethical research includes collaborative partnership, scientific validity, social value, fair participant selection, favorable risk-benefit ratio, independent review, informed consent, and ongoing respect for participants. Taylor highlighted two concepts that are particularly relevant to ASD assessment: social value, she explained, requires that research be designed to be beneficial to the participants (if possible), a community of individuals with a disease or condition, and the wider public, without wasting resources; and scientific validity, which is demonstrated when the research uses reliable, valid, and relevant design and methods in pursuit of its objective. The findings from clinical research are considered relevant to the health problem being studied, when the design is feasible within the context and does not affect the provision of health care services to the local population where the research is being conducted.

To explain these principles, Taylor walked participants through a thought experiment on a hypothetical randomized controlled trial (RCT) of ABA. A research question, she said, could be whether access to ABA is as good or better than another type of intervention. Individuals would be assessed for eligibility in the study via a validated tool and would be randomized to either ABA or an alternative intervention. Using reliable and valid measures, participant outcomes would be assessed over time and compared between groups. Taylor noted, however, that it is unethical to conduct a RCT if there is already a known clinically effective standard of care. In medicine, she explained, there are many care protocols that have not been rigorously tested but are used daily as the standard of care—that using randomization to withhold this care from some patients would be unethical. Another ethical concern with this hypothetical study, Taylor said, is whether the available assessments have the sensitivity and specificity required to determine eligibility for potential participants. Given the shortcomings of the current assessments that Frazier described, she said that it may not be possible to conduct a high-quality RCT of ABA.

Monitoring Program Effectiveness and Quality Improvement

There are several important differences between collecting patient data for quality improvement and assessment purposes versus clinical research, said Taylor. The intention of quality improvement and assessment is to answer a question about the implementation of an intervention within an organization, whereas the intention of clinical research is to produce generalizable findings. In quality improvement, there is little or no manipulation of patients or their environment, and the risks and benefits are similar between those who participate and those who do not. Participation in this type of data collection is unlikely to restrict reasonable decision making by the patients and families. Use of data in this context, said Taylor, does not require IRB oversight or informed consent; however, notification—or the practice of informing those affected of any significant information that could impact their willingness to receive care where the quality improvement intervention is underway—is considered respectful practice.

Taylor shared another example of a hypothetical use of patient data, this time to answer the question: “How well is ABA implemented as an intervention for individuals diagnosed with ASD receiving services via TRICARE?” Similar to the hypothetical research question she described earlier, the ability to answer this question depends on the specificity and sensitivity of the assessments used to diagnosis ASD. As Frazier explained, the current assessments used under the ACD were not built to assess ABA specific outcomes and may have issues with sensitivity and specificity. Thus, said Taylor, it may be challenging to answer this question using the current approaches.

Additional Limitations

In addition to the issues with assessment described above, there are limitations to studying ABA, said Taylor. First, there is variation in fidelity in the provision of ABA. Organizations that deliver ABA face multiple barriers, such as delays in initiation of treatment, penalties for non-compliance, documentation burden, and timely reimbursement, which impact consistency across organizations providing ABA. Second, issues remain with how the battery of assessments is conducted, hindering the effort to gather accurate and complete information, she said. For example, providers may believe that the burden of completing the assessment battery outweighs the limited value, or parents may not be trained to use the

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

parent reporting tools in a way that makes them useful for assessing the population utilizing the ACD

Informed Consent

Nanette Elster, Loyola University Chicago Stritch School of Medicine, followed Taylor’s presentation on general ethical issues by delving more deeply into the issue of informed consent. Elster began by sharing a description of consent, as described by the President’s Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research:

“Ethically valid consent is a process of shared decision making based upon mutual respect and participation, not a ritual to be equated with reciting the contents of a form that details the risks of particular treatments.” (1982, p. 2)

Informed consent is not a new concept, she said, and there are several purposes for consent: to respect autonomy; to build trust; to increase the likelihood of participation, acceptance, or adherence to treatment recommendations; and to ensure voluntariness.

The context in which informed consent is given, and the purpose for which it is given, matter, said Elster. Each context and purpose may have different goals, expectations, and obligations. For example, the obligation of a researcher to a patient participant may be quite different than the obligation of a physician to a patient. Similarly, the patient’s expectations may be different in each of these contexts. The core elements of meaningful informed consent, said Elster, are that the patient is informed of risks, benefits, and alternatives; consent is freely given without coercion or undue influence; consent can be withdrawn at any time for any reason; and that the patient comprehends the circumstances around the consent.

Elster commented on two elements with relevance to patients with a diagnosis of ASD. First, she said, “coercion or undue influence” can take many forms, including making the provision of services contingent on parental compliance with data collection efforts. Elster noted that this type of requirement can interfere with the trust necessary in a provider-patient relationship, and it may penalize children for the actions of their parents. Taylor agreed and added that forcing people to provide information is not an optimal approach for building trust and cooperation. Second, Elster said that comprehension is on a spectrum for patients with ASD—some participants may be minors with limited verbal or intellectual capacities, others may be adults with emerging autonomy, and others may be parents or guardians giving consent on behalf of a patient.

Importantly, said Elster, informed consent is a process, not an event. As a patient evolves from childhood into adolescence and adulthood, the process of informed consent continues. In addition, ongoing dialogue between parents and providers is necessary to identify and address parental concerns and to encourage informed participation in the assessment process. She introduced the idea of “shared decision making,” an approach that encourages relationship building, engagement, empowerment, and patient participation where cognitive and language abilities make such a framework possible. Shared decision making meets the ethical requirements for informed consent, Elster said, and may improve outcomes as well as increase patients’ adherence to treatment plans (Childress & Childress, 2020). She noted that even in circumstances in which informed consent is not mandatory (e.g., quality improvement efforts), it may be beneficial for all parties to have open and mutual communication and trust.

Special Considerations

Elster shared some special considerations for informed consent that are particularly applicable to autistic individuals. First, minors can be involved in the consent process at a level that is appropriate for them by way of assent. The decision on what information to share and how is highly variable, she said, but minors can be involved in their own care and treatment. Second, for patients with intellectual disabilities or other capacity limitations, consent may come from a legally authorized representative, a surrogate decision maker, a legal guardian, or a support person. Finally, Elster said that community consent and involvement can be important considerations in clinical care and research as autistic individuals, researchers, and clinicians can serve as representatives and advocates for others.

Moreover, an important principle within informed consent is “assent,” said Elster. The principle of assent recognizes that children, especially adolescents, can

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

participate at some level in decision-making related to their care (Unguru et al., 2008). In the context of clinical research, there are clear guidelines for obtaining assent from a child subject and for respecting a minor’s dissent from participation (Katz et al., 2016). A child’s unwillingness to participate in clinical research needs to be considered, she said, particularly if the research confers little or no benefit to the child. For autistic adolescents and adults, the shared decision-making model may help patients understand situations and make decisions with the help of trusted friends, family members, and professionals. Shared decision making is a “means for increasing self-determination by encouraging and empowering people to make decisions about their lives to the maximum extent possible” (Blanck & Martinis, 2015). The model reflects the overarching idea of involving those with capacity limitations in decision making, said Elster, and embodies the disability rights slogan “Nothing about us without us.”

Perspectives on ASD

Work with the autistic community has generally adhered to a medical model, said Elster, with an emphasis on “curing” the individual to make them more like a non-disabled, neurotypical individual (Dwyer, 2022). There have been calls to adopt a social model, which contends that disabilities are due to societal barriers and restrictions on individuals with impairments.2 Dwyer (2022) has argued for a neurodiversity approach, said Elster, which looks at disability as the product of an interaction between individual characteristics and the environment around them. Under Dwyer’s approach, curing the individual is not the goal, although there is acknowledgement that disability can be addressed through teaching adaptive skills and by reshaping environments and society to promote well-being. Furthermore, the neurodiversity approach encourages valuing the diversity of minds and brains. Elster said that this approach represents a new way of viewing autism, with corresponding implications for the ethical concepts involved in clinical research, clinical care, and quality improvement.

Elster offered suggestions for groups interested in using patient data for research, treatment, or quality improvement. First, she urged the addition of autistic voices and perspectives of those who care for autistic individuals in the process. Second, Elster emphasized that lived experience can influence how objective outcome measures are interpreted. Third, ABA is one of many interventions, and there may be confounding variables that need to be considered to accurately assess the effectiveness and appropriateness of ABA. Finally, she said that a true assessment of quality must factor in how context might impact outcome measures. For example, the frequent moves of military families may affect a child’s progress and make outcome measures less indicative of treatment effectiveness. Elster closed by emphasizing that the ultimate goal of this work is to improve the quality of life for military families and children who are being served by ABA therapy, and shared a quote by Steven Kapp (2018, pp. S364–S365):

“Studies should consider how the complex relationship between autism and quality of life depends not only on social factors but also on the specific traits or behaviors associated with autism, in that they may sometimes improve individuals’ functioning and well-being. Individuals with the direct lived experience of autism can best explain the distinction between normalization and quality of life.”

ABA TREATMENT PLANNING AND PROGRESS MONITORING: CURRENT STANDARDS OF CARE

In this session, Gina Green, consultant, discussed the assessment process for ABA treatment planning and progress monitoring, with a specific focus on the standards outlined in the Council of Autism Service Providers Applied Behavior Analysis Practice Guidelines for the Treatment of Autism Spectrum Disorder (referred to as ABA Practice Guidelines; CASP, 2024). In addition, she provided a review of the research that informed those standards.

Assessments for ABA Treatment Planning and Progress Monitoring

Green opened by saying that no single assessment, instrument, or battery can yield all the information needed to develop an appropriate treatment plan for, or evaluate progress made by, every member of the heterogeneous ASD population. She noted her perception that there seems to be convergence around the essential components and characteristics of assessment of individuals with ASD:

___________________

2 For an alternative perspective, see Pukki et al. (2022).

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

  • Comprehensive and individualized to identify the unique strengths, needs, and preferences of the client; and include reviews of their medical and developmental history as well as assessment of their current functioning in all relevant domains. Domains include but may not be limited to:
    • Autism characteristics
    • Communication skills
    • Social interaction skills
    • Intellectual skills or developmental level
    • Self-care and other daily living skills
    • Challenging behaviors
    • Co-occurring medical and mental health conditions
    • Quality of life of individual and their family
    • Others depending on age (e.g., skills for participating in educational services, work, community living)
  • Multimodal, involving multiple sources and methods. Sources could include existing records, interviews with the client and caregivers, standardized assessment instruments, and non-standardized assessment instruments and procedures. Standardized assessments are necessary for drawing comparisons across individuals, examiners, settings, and time. They may be norm-referenced (the individual’s score is compared to the average score of a norm group) or criterion-referenced (the individual’s score is compared to a pre-determined standard).

Green noted that standardized assessment instruments need to be both direct (administered to the client) and indirect (administered to caregivers and other third parties), valid and reliable, and appropriate to the client’s chronological age, developmental level, and overall functioning. Non-standardized assessment instruments and procedures, such as direct observation and recording of behaviors, are also essential, she said. Green emphasized the importance of having procedures in place to evaluate the accuracy, reliability, and believability of data produced by non-standardized assessment methods. Importantly, information from all sources can be synthesized and integrated into a comprehensive report that is shared with all relevant parties.

Green also summarized a recent commentary by the co-developers of the Autism Diagnostic Observation Schedule (ADOS; Bishop & Lord, 2023). The authors noted that the ADOS and other diagnostic tools are designed to be administered by professionals with expertise in diagnosing ASD and are intended to formalize and standardize the procedures used to gather information about autism-related symptoms. Bishop and Lord emphasized that standardized diagnostic instruments were developed to ensure that relevant information is available for making diagnostic judgments, said Green, not to prevent access to appropriate services. Bishop and Lord (2023) noted, however, that health care systems are increasingly mandating the use of specific ASD diagnostic tools, and stated, “This can be extremely damaging in situations when standardized instruments cannot be validly administered” (p. 835) and that “blanket requirements directly contradict best practice recommendations for individualizing assessment procedures” (p. 835). Given the heterogeneity of the population, Bishop and Lord argued that the ASD diagnosis itself is less useful for treatment and prognosis purposes than a profile of the individual’s cognitive, language, and adaptive behavior skills as well as their medical and psychiatric symptoms.

Current Standards of Assessment for ABA Treatment Planning and Progress Monitoring

The Council of Autism Service Providers published Applied Behavior Analysis Practice Guidelines for the Treatment of Autism Spectrum Disorder: Guidance for Healthcare Funders, Regulatory Bodies, Service Providers, and Consumers (2024) as an update to guidelines published in 2012 and 2014 by the Behavior Analyst Certification Board. The guidelines, said Green, were derived from reviews of research and best practices by subject matter experts in behavior analysis, psychology, medicine, health care law and other public policies, and consumers of ABA services.

Section 4.1 of these guidelines state that the goal of assessment is “to determine patient baseline skills, develop treatment goals and plans, and identify mea-

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

sures of progress” (CASP, 2024, p. 19). Standards for assessment include: conducting assessment at regular intervals (annually or semi-annually); using multiple sources and processes, including records review, interviews, direct observation, and recording of behaviors in everyday environments; and standardized assessments customized to the patient’s characteristics and the scope of their ABA treatment as well as the characteristics of each assessment instrument. Green said the guidelines specify that direct observation and recording of behaviors needs to include functional behavior assessment to identify environmental events that influence challenging behaviors, and that graphed data on each behavior targeted on the patient’s treatment plan must be reviewed frequently throughout treatment by the behavior analyst. The guidelines state that the frequency with which data are analyzed needs to be individualized (CASP, 2024, p. 58). A comprehensive review of progress may occur weekly, bimonthly, or monthly depending on patient need and intensity of services. Some patients may require more frequent analyses. The guidelines acknowledge the value of indirect measures completed by caregivers and other third parties, said Green, but emphasize that those cannot be the sole or main sources of information for determining medical necessity, treatment dosages, continuation or termination of services, or other critical decisions. Furthermore, the guidelines state that the results of standardized assessments cannot be the only or primary basis for determining medical necessity of ABA services or a patient’s response to treatment (CASP, 2024, p. 26). Finally, Section 4.1 specifies that to minimize risks to the patient, behavior analysts must screen for the emergence of challenging behaviors every 6–18 weeks and if such behaviors are detected, assess further by evaluating medical or other comorbid conditions, and collecting other information about the challenging behaviors via direct observation and recording, functional behavior assessment, interviews and questionnaires from caregivers, and assessments conducted by other professionals (CASP, 2024, pp. 26–27).

Section 4.4 of the ABA Practice Guidelines details standards for progress and outcome measures. The guidelines note that many variables contribute to client outcomes, and state that it is “unlikely that a single set of metrics will be sensitive to treatment outcomes across the entire patient population” (CASP, 2024, p. 51). According to these standards, said Green, instruments and procedures for measuring progress and outcomes need to be valid and reliable; be tailored to patient characteristics and treatment scope, domains, and goals; use multiple methods and sources; and evaluate proximal (short-term) and distal (long-term) outcomes. The guidelines caution that there is currently no consensus or standard for determining “successful” treatment based on the percentage of treatment goals a patient masters (CASP, 2024, p. 53). Practitioners are urged to speak up if a funder or employer requires them to use outcome measures that are not appropriate for the patient. The guidelines emphasize that the selection of assessment instruments and other sources of information must be driven by evidence of their appropriateness for the patient rather than what is familiar to or popular with clinicians, explained Green.

Standardized Measures Commonly Used in ABA Treatment Outcome Studies

Green shared lists of the standardized measures that have been used most often in ABA treatment outcomes studies with autistic individuals. She first presented measures that have been used in studies of comprehensive, intensive ABA (addressing multiple behaviors in multiple domains) for young children with ASD (aged 7 years and younger). Direct measures included standardized assessments of

  • Intellectual skills—Bayley Scales of Infant Development,3 Mullen Scales of Early Learning,4 Wechsler Preschool and Primary Scale of Intelligence or Intelligence Scale for Children,5 Stanford-Binet,6

___________________

3 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Behavior/Bayley-Scales-of-Infant-and-Toddler-Development-%7C-Third-Edition/p/100000123.html

4 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Developmental-Early-Childhood/Mullen-Scales-of-Early-Learning/p/100000306.html

5 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Cognition-%26-Neuro/Wechsler-Preschool-and-Primary-Scale-of-Intelligence-%7C-Fourth-Edition/p/100000102.html

6 For more information see: https://stanford-binet.org/?gad_source=1&gclid=Cj0KCQiA2oW-BhC2ARIsADSIAWqtMb1_neeT4x7GPg2R4EZuTakPDaTBqluUDlqPEhP_s88Mc6hvQaMaAmJBEALw_wcB

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

  • Psychoeducational Profile,7 Differential Abilities Scale8
  • Autism core symptoms—ADOS9
  • Communication skills—Reynell Developmental Language Scales,10 Peabody Picture Vocabulary Test,11 Expressive One Word Picture Vocabulary Test,12 Preschool Language Scale13

Indirect measures used in those studies have included standardized assessments of

  • Adaptive functioning—Vineland Adaptive Behavior Scales,14 Child Behavior Checklist,15 Developmental Profile16
  • Autism core symptoms—Autism Diagnostic Interview,17 Childhood Autism Rating Scale,18 Gilliam Autism Rating Scale,19 Social Responsiveness Scale
  • Maladaptive behavior—Vineland Maladaptive Domain, Child Behavior Checklist, Repetitive Behavior Scale-Revised
  • Caregiver wellbeing—Parental Stress Index20

Next, Green presented a list of measures that have been used in studies of focused ABA interventions (addressing a limited number of adaptive and/or challenging behaviors) not restricted to autistic individuals aged 7 and younger. They include measures of

  • Challenging behaviors—Aberrant Behavior Checklist,21 Vineland Maladaptive Domain, Parental Stress Index, Clinical Global Impression – Improvement Scale22
  • Social communication skills—Early Social Communication Scales,23 Social Responsiveness Scale, Social Skills Improvement System – Rating Scale,24 Behavioral Assessment System for Children,25 Vineland Communication Domain

Take Aways

During a question-and-answer session, Green explained that few standardized assessments can detect changes in the relatively small numbers of discrete behaviors that are targeted in focused ABA interventions, but certain standardized assessments may be used to corroborate behavior changes that are documented by direct observation and recording. Green observed that the sole measure of caregiver wellbeing used in the ACD has been the Parental Stress Index and noted that quality-of-life assessments may have greater utility. In addition, Green noted that one of the limitations of the current ACD assessment battery is that all the measures are indirect; there are no direct assessments of the patient’s functioning over the course of services. Finally, Green said that due to the heterogeneity of the ASD population, assessment procedures need to be appropriately individualized to each patient based on characteristics including age, scope of treatment, and goals of treatment, as outlined in the ABA Practice Guidelines. Frazier agreed that no one assessment battery fits all patients but said that the ACD

___________________

7 For more information see: https://www.wpspublish.com/pep-3-psychoeducational-profile-third-edition

8 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Cognition-%26-Neuro/Differential-Ability-Scales-II/p/100000468.html

9 For more information see: https://www.wpspublish.com/ados-2-autism-diagnostic-observation-schedule-second-edition

10 For more information see: https://www.gl-assessment.co.uk/assessments/products/new-reynell-developmental-language-scales/

11 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Academic-Learning/Peabody-Picture-Vocabulary-Test-%7C-Fourth-Edition/p/100000501.html

12 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Speech-%26-Language/Receptive-and-Expressive-One-Word-Picture-Vocabulary-Tests-%7C-Fourth-Edition/p/100000338.html

13 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Speech-%26-Language/Preschool-Language-Scales-%7C-Fifth-Edition/p/100000233.html

14 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Behavior/Vineland-Adaptive-Behavior-Scales-%7C-Third-Edition/p/100001622.html?tab=product-details

15 For more information see: https://aseba.org/

16 For more information see: https://www.wpspublish.com/dp-4-developmental-profile-4.html

17 For more information see: https://www.wpspublish.com/adi-r-autism-diagnostic-interviewrevised.html

18 For more information see: https://www.wpspublish.com/cars-2-childhood-autism-rating-scale-second-edition.html

19 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Behavior/Gilliam-Autism-Rating-Scale-%7C-Third-Edition/p/100000802.html

20 For more information see: https://www.parinc.com/products/PSI-4

21 For more information see: https://www.slossonnews.com/abc.html

22 For more information see: https://www.nppsychnavigator.com/Clinical-Tools/Psychiatric-Scales/Scale-2

23 For more information see: https://link.springer.com/referenceworkentry/10.1007/978-1-4419-1698-3_287

24 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Behavior/Social-Skills-Improvement-System-SSIS-Rating-Scales/p/100000322.html

25 For more information see: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Behavior/Behavior-Assessment-System-for-Children-%7C-Third-Edition-/p/100001402.html

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

could use a core battery of assessments with flexibility to allow individual patient characteristics to be considered. In addition, he agreed that direct, objective assessments of the patient’s functioning would be a useful supplement to subjective, informant-based assessments. In choosing instruments for an outcomes assessment battery, a participant said it is important to have guidelines that ensure quality measures while also allowing for the adoption of new and innovative tools. Green agreed with the participant’s point regarding the need for innovation but emphasized the importance of using the best available assessment tools while others are in development.

REFERENCES

Blanck, P., & Martinis, J. G. (2015). “The right to make choices”: The National Resource Center for Supported Decision-Making. Inclusion, 3(1), 24–33.

Childress, J. F., & Childress, M. D. (2020). What does the evolution from informed consent to shared decision making teach us about authority in health care? AMA Journal of Ethics, 22(5), E423–E429. https://doi.org/10.1001/amajethics.2020.423

Council of Autism Service Providers (CASP). (2024). Applied behavior analysis practice guidelines for the treatment of Autism Spectrum Disorder: Guidance for healthcare funders, regulatory bodies, service providers, and consumers. https://www.casproviders.org/asd-guidelines

Dwyer, P. (2022). The neurodiversity approach(es): What are they and what do they mean for researchers? Human Development, 66(2), 73–92.

Kapp, S. K. (2018). Social support, well-being, and quality of life among individuals on the autism spectrum. Pediatrics, 141(Suppl 4), S362–S368. https://doi.org/10.1542/peds.2016-4300N

Katz, A. L., Webb, S. A., & Committee on Bioethics. (2016). Informed consent in decision-making in pediatric practice. Pediatrics, 138(2), e20161485. https://doi.org/10.1542/peds.2016-1485

President’s Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research. (1982). Making health care decisions: A report on the ethical and legal implications of informed consent in the patient-practitioner relationship. Volume one: Report. US Government Printing Office.

Pukki, H., Bettin, J., Outlaw, A. G., Hennessy, J., Brook, K., Dekker, M., Doherty, M., Shaw, S. C. K., Bervoets, J., Rudolph, S., Corneloup, T., Derwent, K., Lee, O., Rojas, Y. G., Lawson, W., Gutierrez, M. V., Petek, K., Tsiakkirou, M., Suoninen, A., Minchin, J., … Yoon, W. H. (2022). Autistic Perspectives on the Future of Clinical Autism Research. Autism in Adulthood: Challenges and Management, 4(2), 93–101. https://doi.org/10.1089/aut.2022.0017

Unguru, Y., Coppes, M. J., & Kamani, N. (2008). Rethinking pediatric assent: From requirement to ideal. Pediatric Clinics of North America, 55(1), 211–222. https://doi.org/10.1016/j.pcl.2007.10.016

Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.

Image

DISCLAIMER This Proceedings of a Workshop—in Brief was prepared by Erin Hammers Forstag as a factual summary of what occurred at the workshop. The statements made are those of the rapporteur or individual workshop participants and do not necessarily represent the views of all workshop participants; the committee; or the National Academies of Sciences, Engineering, and Medicine.

REVIEWERS To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop—in Brief was reviewed by Doreen Samelson, Catalight. We also thank staff member David Cohen for reading and providing helpful comments on this manuscript. Kirsten Sampson Snyder, National Academies of Sciences, Engineering, and Medicine, served as the review coordinator.

COMMITTEE MEMBERS George W. Rutherford, University of California, San Francisco (Chair); Brian A. Boyd, University of North Carolina at Chapel Hill; Wendy K. Chung, Boston Children’s Hospital; Lauren Erickson, Institute for Exceptional Care; Eric M. Flake, Uniform Services University and the University of Washington; Patrick Heagerty, University of Washington; A. Pablo Juárez, Vanderbilt University Medical Center; Samuel L. Odom, University of North Carolina at Chapel Hill; Jennifer E. Penhale, Colorado Developmental Disabilities Council; José E. Rodriguez, University of Utah Health; Andy Shih, Autism Speaks; Kristin Sohl, University of Missouri School of Medicine; Aubyn C. Stahmer, University of California, Davis, Mind Institute; Ruth E. Stein, Children’s Hospital at Montefiore; Allysa N. Ware, Family Voices; Zachary “Zack” J. Williams, Vanderbilt University Medical Center

SPONSORS This workshop was supported by contracts between the National Academy of Sciences and the Department of Defense’s Defense Health Agency (HT940223C000). Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.

For additional information regarding the workshop, visit: https://www.nationalacademies.org/our-work/independent-analysis-of-department-of-defenses-comprehensive-autism-care-demonstration-program

SUGGESTED CITATION National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: National Academies Press. https://doi.org/10.17226/29086.

Division of Behavioral and Social Sciences and Education

Copyright 2025 by the National Academy of Sciences. All rights reserved.

NATIONAL ACADEMIES Sciences Engineering Medicine The National Academies provide independent, trustworthy advice that advances solutions to society’s most complex challenges.
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 1
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 2
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 3
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 4
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 5
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 6
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 7
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 8
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 9
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 10
Suggested Citation: "Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief." National Academies of Sciences, Engineering, and Medicine. 2025. Understanding Methods for Measuring Program Effectiveness and Clinical Effectiveness: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/29086.
Page 11
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.