Minority Students in Special and Gifted Education (2002)

Chapter: 8 Alternative Approaches to Assessment

Previous Chapter: 7 Assessment Practices, Definitions, and Classification Criteria
Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

8
Alternative Approaches to Assessment

While the vision in the Individuals with Disabilities Education Act (IDEA) and associated state guidelines is of a program that looks carefully at the individual needs of a student who is referred for special education, both the state guidelines implemented at the school level and traditional special education assessment rely heavily on standardized batteries of tests. Those same standardized test scores are frequently the primary determinant of eligibility for gifted and talented programs. In this chapter, we review the major challenges to these standardized testing practices, including challenges to the very notion of context-free measures of intellectual ability, as well as challenges to the usefulness and efficiency of standardized scores in providing information that is relevant to intervention. We then discuss alternative approaches to assessment that are tied more closely to intervention and present our recommendations for policy change.

CONTEXT, CULTURE, AND ASSESSMENT

Approaches to assessing intellectual ability used widely in special and gifted education placement (see Chapter 7) are rooted in a conception of intelligence as a general factor (often labeled g), which underlies all adaptive behavior (Sternberg, 1999; Jensen, 1998). The very notion of decontextualized intelligence is challenged by two lines of work that highlight the

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

role of culture and context in the development and assessment of intellectual abilities. One line of work, termed here cross-cultural psychological research, has focused on the influence of factors related to culture and context on testing and on cognition more generally. The other line of work is from a more traditional psychological or psychometric orientation and is focused somewhat more directly on issues of test bias and cultural bias in standardized assessment batteries, including IQ and intellectual ability measures.

Cross-Cultural Psychological Research on Cognitive and Intellectual Ability

Rogoff and Chavajay (1995) have traced the development of crosscultural psychological research over the past three decades. Initially much attention was directed at the exploration in other cultural settings of the robustness of cognitive tasks developed in the United States and in Europe. Emanating from a Piagetian perspective, a great deal of this work investigated the claims of universality of the stages of intellectual and cognitive development (Dasen, 1977a, b; Dasen and Heron, 1981). A clear finding is that people in many cultures did not reach what is called the formal operational stage without having had extensive experience in school (Ashton, 1975; Goodnow, 1962; Super, 1979). Characteristics assumed intrinsic to child development were found to be context dependent.

In the attempt to understand this variation, many investigators began to examine the power of situational contexts of testing and the issue of subjects’ familiarity with test materials and concepts (Irwin and McLaughlin, 1970; Price-Williams et al., 1969; Ceci, 1996; Gardner, 1983; Lave, 1988; Nuñes et al., 1993). Cross-cultural settings were particularly productive for this purpose (Posner and Barody, 1979; Dasen, 1975; Carraher et al., 1985; Ceci and Roazzi, 1994; Nuñes, 1994). Several studies documented clear differences across cultures in people’s ability to sort objects into taxonomic categories (Cole et al., 1971; Hall, 1972; Scribner, 1974; Sharp and Cole, 1972; Sharp et al., 1979). Those whose experiences were not rooted in Western schooling tended to sort objects into functional categories rather than into more abstract conceptual taxonomies. In tasks thought to tap into logical thinking, often employing logical syllogisms, non-Western subjects often refused to accept the premise of the task, preferring to confine reasoning and deduction to immediate practical experience rather than hypothetical situations (Cole et al., 1971; Fobih, 1979; Scribner, 1975, 1977; Sharp et al., 1979). When the task was modified to focus on immediate and familiar everyday experience, non-Western subjects were able to make judgments, draw conclusions, and exhibit other features of

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

logical thinking and memory that appeared absent in hypothetical problem solving (Cole et al., 1971; Cole and Scribner, 1977; Dube, 1982; Kagan et al., 1979; Kearins, 1981; Lancy, 1983; Mandler et al., 1980; Neisser, 1982; Price-Williams et al., 1967; Rogoff and Waddell, 1982; Ross and Millsom, 1970; Scribner, 1974, 1975, 1977).

This body of work led many investigators to challenge the assumption that cognitive tasks or batteries developed in a specific cultural setting were context-free measures of cognitive abilities (Cole et al., 1976; Ceci, 1996, Gardner, 1983; Lave, 1988; Nuñes et al., 1993). Research focused on analogues of standardized cognitive tasks that were embedded in people’s everyday lives, such as weaving patterns, the calculating of change in the store, and personal narration (Cole et al., 1976; Greenfield, 1974; Greenfield and Childs, 1977; Lave, 1977; Serpell, 1977). In many of these studies, “native” subjects were shown to perform better than Western subjects when the materials and tasks reflected some correspondence to the more familiar, everyday versions of the tasks. During this same period, increasing attention was directed to the social context surrounding standardized testing situations and the study of testing as a unique context in itself with its own discourse and interactional rules for what constitutes appropriate behavioral expectations (Goodnow, 1976; Miller-Jones, 1989; Rogoff and Mistry, 1985).

In more recent research challenging a universal g factor, Sternberg and Grigorenko (1997b) tested Kenyan children using several different instruments: one measured tacit knowledge of appropriate use of natural herbal medicines, including their source, their use, and dosage. Two other instruments designed to measure reasoning ability (Raven’s Coloured Progressive Matrices Test) and formal knowledge-based abilities (Mill Hill Vocabulary Scale) were administered as well. The findings showed no correlation between the “practical intelligence” measured by the herbal medicine test and the test scores for reasoning ability, as well as a negative correlation with the formal knowledge-based test. Ethnographic work with the families suggested to the authors that they saw either formal schooling or practical knowledge as relevant to a child’s future and so emphasized only one. The implication drawn by the authors is that variation in performance on intelligence tests may capture what is valued in the home environment rather than what is intrinsic to the child’s intellectual ability (Sternberg, 1999).

International research results have been supported in research done more locally. Housewives in Berkeley, California who successfully did mathematics when comparison shopping were unable to do the same mathematics when placed in a classroom and given isomorphic problems presented abstractly (Lave, 1988; Sternberg, 1999). A similar result was found with weight watchers’ strategies for solving mathematical measurement prob-

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

lems related to dieting (de la Rocha, 1986). Men who successfully handicapped horse races could not apply the same skill to securities in the stock market (Ceci and Liker, 1986; Ceci, 1996).

In short, the available cross-cultural literature suggests that variations from the cultural norms embedded in tests and testing situations may significantly influence the judgments about intellectual ability and performance resulting from their use. Researchers have documented how these sociocultural contexts in the homes of different ethnic, racial, and linguistic groups in the United States can vary significantly from those of mainstream homes (Goldenberg et al., 1992; Heath, 1983, 1989). In light of differences in the fit between home and school culture for many minority children and the difference in the school experiences provided (see Chapter 5), these results bear directly on IQ testing of minority children.

Psychometric Views of Culture and Context: Research on Test Bias

In contrast to the cross-cultural and sociocultural research just described, work from a psychometric framework has centered on the issue of test bias. As early as the mid-1970s, questions were raised about the effects of cultural differences on standardized tests and their interpretation (Mercer, 1973a). Some researchers have considered the long-standing patterns of disproportionate representation of certain racial, ethnic, and English language learner groups in special education as de facto evidence of test bias (Bermudez and Rakow, 1990; Hilliard, 1992; Patton, 1992). The general argument has been that the content, structure, format, or language of standardized tests tends to be biased in favor of individuals from mainstream or middle- and upper-class backgrounds. Miller (1997) argues that all measures of intelligence are culturally grounded because performance depends on individual interpretations of the meaning of situations and their background presuppositions, rather than on pure g.

A contrasting approach to test bias is based on a more statistical or psychometric view. That is, a test is considered biased if quantitative indicators of validity differ for different groups (Jensen, 1980). A common procedure has been to conduct item analysis of specific tests to examine construct validity. A specific test would be considered to be biased if there is a significant “item by group interaction,” suggesting that a specific item deviates significantly from the overall profile for any group. Several researchers have concluded that there is no evidence for test bias using such procedures (Jensen, 1974; Sandoval, 1979), a view that was embraced by the 1982 National Research Council (NRC) committee (1982). Other investigators have noted, however, that cultural factors may serve to depress the scores of a particular group in a more generalized or comprehensive fashion so that individual items would not stand out, even though cultural

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

effects may still be present (Figueroa, 1983). This would be the case if familiarity with testing itself were at issue.

Another psychometric indicator of bias that has been used is predictive validity. Normally this involves correlating measures of intellectual functioning with academic achievement, such as grades. Generally moderate to high correlation coefficients are obtained in these analyses. Critics such as Hilliard (1992) point out that the same biases that operate on standardized tests also are likely to operate in institutions such as schools. Moreover, Reschly et al. (1988) have suggested that these analyses when applied to students referred to special education are not predictive in a true sense, since the standardized measure is normally administered only after low achievement has been demonstrated.

There is also a long tradition of investigation into the social and contextual factors embedded in standardized testing situations, in particular those conducted one-on-one with an unfamiliar examiner. Perhaps because it is easier to demonstrate these effects empirically, it has been argued that effects such as examiner familiarity differentially affect Hispanic and black children (Fuchs and Fuchs, 1989). However, efforts to determine whether white examiners impede the test performance of black children have found no evidence that they do (Sattler and Gwynne, 1982; Moore and Retish, 1974).

A recent, more comprehensive treatment of the issues raised here is presented by Valencia and Suzuki (in press). In addition, discussion of issues specific to English language learners is found in Valdés and Figueroa (1994) and elsewhere in this volume. It is important to note, however, that many have begun to question the utility of the debate, at least with respect to designing meaningful interventions for students. That is, even if the ideal standardized test could be created that minimized the incorrect categorization or labeling of individual students, the question still remains: What does such an approach have to offer in terms of designing appropriate interventions that will maximize achievement and academic outcomes (Reschly and Tilly, 1999)? For this reason, many have begun to turn attention to more academically meaningful assessment approaches, such as performance-based assessment, curriculum-based measures, and other approaches more closely tied to instruction and classroom practice.

Problems with IQ-Based Disability Determination

Objections to IQ testing and strong reactions to the interpretation of IQ test differences as reflecting hereditary differences among groups continue to complicate discussions of the meaning, appropriate uses, and possible biases in tests of general intellectual functioning. In addition to the limitations of IQ tests from the perspectives of cultural psychology, it is

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

questionable whether the costs of IQ tests are worth the benefits in special education eligibility determination. The costs of the testing alone are several hundred dollars in the form of the time of related services professionals such as psychologists and do not include either an estimate of the costs in the time of the students or an analysis of the usefulness of what might be done in place of IQ tests (MacMillan et al., 1998a).

Treatment validity. Perhaps the most convincing of the arguments against IQ tests is that the results are largely unrelated to the design, implementation, and evaluation of interventions designed to overcome learning and behavioral problems in school settings. For example, IQ is not a good predictor either of the kind of reading problem that a student exhibits or of the student’s response to treatments designed to overcome that reading problem (Fletcher et al., 1994). The same general interventions appear to work with basic skills problems regardless of whether the student is classified with mild mental retardation (MMR), learning disability (LD), or emotional disturbance (ED) (Gresham and Witt, 1997; Reschly, 1997). The differentiation between LD and MMR that is done primarily with IQ test results does not lead to unique treatments or to more effective treatments. Moreover, it is noted by MacMillan and colleagues (1998a) that significant numbers of students now classified as LD are in the borderline range of ability of about 70-85 or, in some cases, functioning in the MR range defined by an IQ of approximately 75 or below.

Misuse and racism. Further objections to the use of IQ-based disability determination come from the literature documenting the misuse of IQ tests to justify racist interpretations of individual differences among groups. No contemporary test author or publisher endorses the notion that IQ tests are direct measures of innate ability. Yet misconceptions that the tests reflect genetically determined, innate ability that is fixed throughout the life span remain prominent with the public, many educators, and some social scientists. These myths about the meaning of such results markedly complicate rational discussion of the proper role that IQ tests results might play in disability determination in school settings.

Mercer (1979b) provided a useful discussion of the very narrow conditions under which differences among individuals on IQ tests might properly be interpreted as indicating differences in genetic bases for intellectual performance. The necessary conditions never occur with groups that differ by economic resources, cultural practices, and educational achievement. Moreover, test authors and test publishers all acknowledge that IQ tests are measures of what individuals have learned—that is, it is useful to think of them as tests of general achievement, reflecting broad culturally rooted ways of thinking and problem solving. The tests are only indirect measures

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

of success with the school curriculum and imperfect predictors of school achievement.

LD classification criteria. The most frequent use of IQ tests today is in determining whether a “severe discrepancy between achievement and intellectual ability” exists as per the federal criteria for LD (34 CFR 300.541) and state LD classification criteria. Several problems exist with this procedure. First and most fundamental, there is no “bright line” in performance that can be used to determine the appropriate size of the discrepancy; the size required is arbitrary. Some states use more stringent criteria (e.g., 23 standard score points), others more lenient ones (15 points). Second, serious technical problems exist with the methodologies for discrepancy determination used in most states that do not account for the phenomenon of regression to the mean, a special problem with extreme scores (Mercer et al., 1996; Reynolds, 1985). Failure to account for regression effects penalizes lower-scoring students in decreasing the likelihood of being diagnosed as LD rather than MMR. A third problem with the discrepancy method is that its intended objectivity may not be realized if multidisciplinary teams that are willing to administer a large number of achievement tests until the requisite discrepancy is attained without careful consideration of which test is most valid for a particular child and achievement problem. This activity is often predicated on the altruistic-sounding motive of making sure that students with achievement problems get services designed to ameliorate their difficulties; however, it seriously undermines the purpose of having an eligibility criterion.

A fourth and more fundamental problem with the intellectual ability/ achievement discrepancy is that the discrepancy is inherently unreliable in a single measurement occasion and notoriously unstable in repeated measurement occasions (Shinn et al., 1999). Moreover, the vast majority of students evaluated for LD and special education placement have discrepancies that just meet or just fail to meet the discrepancy criterion. The instability of the discrepancy means that if they were assessed again, the discrepancy status for many would change. It is important to remember that these problems occur with students with low achievement, some of whom are found eligible for LD and others of whom, with equally low achievement, especially those with IQs in the 70s and 80s, often are found ineligible. Is this a valid distinction?

Validity of LD Discrepancies

The case against using the “severe discrepancy between achievement and intellectual ability” criterion is further strengthened by a series of studies funded by the National Institute of Child Health and Human Devel-

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

opment (NICHD) (Lyon, 1996), which reached a number of conclusions about the use and validity of IQ in defining LD in the area of reading:

Results do not support the validity of discrepancy versus low achievement definitions. Although differences between children with impaired reading and children without impaired reading were large, differences between those children with impaired reading who met IQ-based discrepancy definitions and those who met low reading achievement definitions were small or not significant (Fletcher et al., 1994:6).

The present study suggests that the concept of discrepancy operationalized using IQ scores does not produce a unique subgroup of children with reading disabilities when a chronological age design is used; rather, it simply provides an arbitrary subdivision of the reading-IQ distribution that is fraught with statistical and other interpretative problems (Fletcher et al., 1994:20).

Poor readers who make up 70 to 80 percent of the current LD population seem to have the same needs and the same cognitive processing profiles, and they respond to the same treatments regardless of their IQ status (it should be noted that children with IQs less than 80 generally were excluded from the NICHD studies). Therefore, arbitrarily dividing poor readers into subgroups with higher IQs (those who meet the current LD criteria) and those with IQs similar to their reading achievement levels is invalid. With regard to reading-related characteristics, these subgroups are much more similar than different, calling into serious question the current LD diagnostic practices.

These practices have an even more serious side effect: the wait-to-fail phenomenon. Learning to read in the early grades is crucial. The evidence suggests that a student’s status as a poor or good reader at the end of 3rd grade is highly stable through adolescence (Coyne et al., 2001; Juel, 1988). To be effective, intervention needs to occur early with poor readers; otherwise, there are grave barriers to changing from learning to read (in the kindergarten to 3rd grade period) to reading to learn (in 4th grade and beyond).

Special education services for students with reading and math achievement problems are typically delayed until 2nd, 3rd, or 4th grade by the intellectual ability/achievement discrepancy criterion for LD. As noted by Fletcher and colleagues, “For treatment, the use of the discrepancy models forces identification to an older age when interventions are demonstrably less effective” (Fletcher et al., 1998:201). This effect of the IQ-achievement discrepancy method greatly diminishes the potential positive effects of LD services because they are initiated after two or more years of failure (Fletcher, 1998), not when it first is apparent that a student is having significant problems in acquiring reading or math skills. The wait-to-fail

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

effects are markedly damaging to students and equally negative regarding the potential positive effects of special education. Significant changes in how LD is diagnosed, along with universal early interventions for children with reading problems, are crucial to improving the current system and to improving the achievement of minority children and youth.

Problems with Abandoning IQ-Based Disability Determination

Before leaving the topic of IQ-based disability determination, the long tradition associated with the use of IQ in determining disabilities and the current practices involving IQ across a variety of contexts must be acknowledged. If IQ-based conceptions and classification criteria for LD and MMR were abandoned, significant retraining of existing special education and related services personnel would be required. Even more daunting is the change required in the thinking of professionals and the public about disabilities—a change from assumptions of fixed abilities and internal child traits to new assumptions about the malleability of skills and the powerful effects of instruction and positive environments. Belief changes of this magnitude do not occur immediately or easily, but they are supported by research understanding and are likely to be beneficial to children.

Abandoning IQ-based disability determination will complicate articulation of eligibility and service delivery across different settings and agencies. The largest problem is likely to occur with MR, a disability category recognized in the laws pertaining to a number of agencies, including law enforcement and social security. For example, a person with an IQ below 60 is presumptively eligible for Social Security Income Maintenance benefits, and persons with IQs in the 60 to 70 range are eligible pending an evaluation of intellectual functioning and confirmation of deficits in adaptive behavior (as well as meeting income requirements). Examination of school records is often part of the process of identifying deficits in adaptive behavior. School practices over the past 25 years involving increasing reluctance to identify MMR and the apparent practice of diagnosing some students as LD who meet criteria for MMR compromise the usefulness of school records and potentially undermine an individual’s access to services and protections that should be accorded to persons with MR. Today, IQ data typically are available for persons classified as LD, and those data assist with determination of adult eligibility for services. In the future, such data may not be available.

A counterargument, however, is that schools should not identify disabilities to meet the needs of other agencies. The goal of the schools is to assist children and youth in developing the academic skills, problem-solving capabilities, social understanding, and moral values that promote successful adult lives. The use of IQ tests and IQ-based disability determination

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

does not promote the achievement of those critical goals; therefore, IQ should be abandoned, even if that action complicates the work of other agencies. It seems entirely reasonable to expect the other agencies to collect data relevant to their eligibility and to learn how to use the kinds of school data described in the last section.

Use of the diagnostic construct of MR without IQ is problematic. Intellectual functioning is critical to all contemporary conceptions of MR and has been a part of the construct since it first was differentiated from mental illness by John Locke in the 17th century (Kanner, 1964; Doll, 1962, 1967). No one has developed alternative criteria for this diagnostic construct that do not use intellectual functioning either implicitly or explicitly. Before classifying someone as MR, given all of the classification schemes that currently exist (American Psychiatric Association, 1994; Luckasson et al., 1992; World Health Organization, 1992), use of a comprehensive and reliable test of general intellectual functioning is mandatory. Some children may be incorrectly classified as MR if IQ is eliminated from the MR conceptual definition and classification criteria (Lambert, 1981; Reschly, 1981, 1988d); IQ tests results can protect children from the more subjective judgments of adults.

It also is important to recognize what will not occur with an elimination of IQ-based disability determination and the use of IQ tests in the full and individual evaluation of students suspected of having disabilities. First, current patterns of over- and underrepresentation in special education and for gifted and talented services are likely to continue unless substantial improvements in levels of minority students’ achievement are realized. As noted in the 1982 NRC report, IQ tests are not mechanically applied to all students in the general population. If they were, “the resulting minority overrepresentation would be almost 8 to 1” (NRC, 1982:42). At the time of that report, the actual overrepresentation in MR was slightly over 3 to 1. Further evidence of continued overrepresentation even though IQ testing was eliminated is available from California, where federal Judge Robert Peckham issued a ban on IQ testing in 1986 that was in effect until 1992, when it was modified by the same judge. The ban had no effect on disproportionate special education representation.

The IQ issue in the context of special education was never the principal issue to the Larry P. v. Riles court, which in 1979 and 1986 ordered first a limitation of the use of IQ tests with black students and then a complete ban on such use. The judge clarified his views of the meaning of the case in 1992 with the following comments: “First, the case was,...clearly limited to the use of IQ tests in the assessment and placement of African-American students in dead end programs such as MMR” (Larry P., 1992, also cited as Crawford et al. v. Honig [Crawford et al.], 1992:15). Furthermore,

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

“Despite the Defendants’ attempts to characterize the court’s 1979 order as a referendum on the discriminatory nature of IQ testing, this court’s review of the decision reveals that the decision was largely concerned with the harm to African-American children resulting from improper placement in dead-end educational programs” (Crawford et al., 1992:23).

The real Larry P. issue, according to the judge who adjudicated the case over a 20-year period from 1972 to 1994, was the effectiveness of special education programs for black students. Without data confirming effectiveness, Judge Peckham regarded overrepresentation as highly suspicious. The 1992 order required the California Department of Education to inform the court regarding which of the 1990s special education programs in California were “substantially equivalent” to the dead-end programs of concern to the court in the 1979 opinion. Instead of responding to that order, the department appealed the decision to the 9th Circuit. The appeal was rejected, leaving Judge Peckham’s 1992 order to stand. No further action in the Larry P. case has occurred since 1994, although the 1992 order to the California Department of Education is still in effect.

Perhaps the most important lesson from Larry P. is that the outcomes of special education matter a great deal in judging fairness to the minority students who are overrepresented in programs. Demonstratably effective outcomes would probably have changed the original ban on IQ tests and would greatly diminish if not eliminate contemporary concerns about disproportionate representation. This leads to a useful reframing of the IQ issue, providing as well the foundation of the next section on alternatives to the current system of special education. Are IQ tests useful in promoting positive outcomes for children and youth with severe achievement and social behavior problems? In the committee’s view, the balance of the evidence does not provide continued support for the use of IQ tests in special education decision making.

The major advantages of eliminating IQ-based disability determination and use of IQ in the full and individual evaluations have to do with focusing the efforts of parents, students, teachers, and related services personnel on promoting greater competence in academic skills and social behaviors. The use of IQ tests detracts from efforts to analyze environments carefully and develop effective interventions. The time and cost of IQ testing during the full and individual evaluation and reevaluations could be put to better use if they were devoted to more thorough analyses of reading, math, written language, or other achievement deficits, as well as analyses and development of interventions for classroom behaviors that interfere with effective instruction and achievement of positive learning outcomes. Abandoning IQ testing does not automatically produce more appropriate assessment. Accomplishment of the latter will require significant changes in state and local

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

practices as well as substantial continuing education efforts. The promise, however, of better outcomes justifies the difficulties and costs associated with making these changes.

IQ Tests and Gifted and Talented Determination

Programs for the gifted and talented can be academic programs, leadership programs, or arts (including music) programs. IQ testing is relevant only to the first. Identification for academically gifted programs, to be responsive to available interventions, should identify those students in a discipline who require and can profit from instruction that moves at a quicker pace, and that explores topics in more depth and complexity if that is what gifted programs have to offer.

While objections to IQ as a measure of innate intelligence are many, few would contest the evidence that IQ predicts school success. It may well be that IQ tests capture the same skills and abilities as are captured by successful school performance, and that neither is a measure of innate intelligence. Even so, IQ tests may successfully identify students who are most likely to succeed in programs for the academically gifted. Snow (1995) argued that despite its many drawbacks, IQ tests do successfully identify the ability to deal with complexity. To the extent that gifted programs provide access to accelerated and more complex curricula, IQ test results may be relevant to placement. Scores for verbal and quantitative subtests should be considered separately, however, since mathematical and verbal giftedness are separate dimensions (Benbow and Minor, 1990), and a single score should never be used in isolation.

In a homogeneous, middle-class, suburban school, the above arguments may be persuasive. The more diverse the tested population, however, the greater the challenge to those arguments. Student who do not excel on IQ tests, as argued above, may be less familiar with testing procedures, and for reasons of background and culture they may have less familiarity with the types of items on the test. As the body of research reviewed above suggests, their reasoning capacity and skilled performance may be exceptional when the referents are familiar, but unexceptional in the context of the test. If the characteristic that distinguishes academically gifted students from their peers is advanced ability to learn, unfamiliarity with test taking and with test items may obscure that ability.

Research done by Sternberg and colleagues (1999, 2001) in Tanzania lends empirical support for this concern. A sample of 358 schoolchildren were given intelligence tests. They were then given a 5 to 10 minute period of instruction in which they were able to learn skills that would potentially enable them to improve their scores. When they were retested, the students registered on average small, statistically significant gains. Importantly,

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

scores on the pretest showed only weak correlations with scores on the posttest. This suggests that for populations inexperienced with test taking, a small amount of training can change scores significantly. More importantly, it suggests that initially high-scoring students are not necessarily those who learned most from instruction. The authors found that the posttraining scores were better predictors of transfer to other cognitive performance tasks than were the pretraining scores.

The research base, in the committee’s view, is not sufficiently developed to permit either a complete embrace or a complete rejection of IQ testing for placement in gifted and talented programs. The lack of a consensus, coupled with well-reasoned questions concerning the validity of psychometric intelligence tests, provides sufficient warrant for supporting multiple means of assessment at this time.

But multiple means of assessment, based on a lack of scholarly consensus, should be considered only a temporary measure. The committee regards it as a priority matter that the findings from research on the contextual basis of test performance, as well as other aspects articulated in the wide-ranging scholarly critique of decontextualized intelligence testing, be engaged in an effort to study the implications of culture and context on efforts to assess children for gifted and talented placement. As with assessment for special education, assessment alternatives should be anchored in an understanding of the characteristics of students that constitute a need for a different educational program, and should be valid with respect to the gifted programs available to students.

The short-term resolution of this dilemma is crafted in light of the existing state of knowledge and the desirability of continuing to provide exceptional learners with interventions that support their genuinely different educational needs. The short-term resolution should not, however, become the de facto appropriate means of assessment.

ALTERNATIVES TO TRADITIONAL CLASSIFICATION AND PLACEMENT

We now turn to alternative approaches to assessment that would better match student need to program interventions. It is important to emphasize that the current methods of identifying students with low-incidence disabilities are not the focus of this discussion; it is assumed that the current practices regarding determining eligibility and special education needs for these students will continue. The overarching theme in this discussion is improving achievement and social learning outcomes for all children and youth, including the minority students currently disproportionately represented in the MR, ED, and gifted and talented categories.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.
Universal Screening, Prevention, and Early Intervention

In considering alternatives to the current identification process, the committee considered two goals paramount: (1) assuring that the pool of children identified are those who need and can benefit from special or gifted education, and (2) assuring that the assessment procedures maximize the opportunity for effective intervention. Both concerns point us to early, universal screening.

Universal screening of young children to detect problems in the early development of academic and behavioral skills is increasingly recognized as crucial to achieving better school outcomes and preventing achievement and behavior problems. Evidence suggests that effective and reliable screening of young children by ages 4-6 can identify those most at risk for later achievement and behavioral problems (Coyne et al., 2001; Fuchs and Fuchs, 2001; Graham et al., 2001; NICHD, 2000; Kellam et al., 1998b), including those most likely to be referred and placed in special education programs. Cost-effective screening measures use structured interviews, rating scales, and checklists completed by teachers and parents as well as simple, brief measures of skills administered directly to children (Good and Kaminski, 1996; Walker and McConnell, 1995; Achenbach and Edelbrock, 1986; Werthamer-Larsson et al., 1991).

Early screening is rather futile, however, if it is not followed by effective interventions. In fact, instructional and social training programs for parents and teachers are available that can produce significant gains for many children showing at-risk characteristics at ages 4-6 (for reading interventions, see NICHD, 2000; NRC, 1998; for behavioral interventions, see McNeil et al., 1991; Reid et al., 1999; Hawkins et al., 1992; Kellam et al., 1998b; U.S. Department of Health and Human Services, 2001b). It is important to recognize that the nearly inevitable effect of universal early screening will be higher identification of disadvantaged students, a disproportionate number of whom are members of minorities. West et al. (2000) reported rates of mastery of skills that are early predictors of later reading success. Black and Hispanic students were behind Asian and white children both at the beginning and at the end of kindergarten, and the lower-scoring groups made slightly smaller gains over the course of the year (see Chapter 3). Studies of achievement at kindergarten and 4th grade through the National Assessment of Education Progress (NAEP) (Donahue et al., 2001; West et al., 2001) and other national measures of achievement provide a basis for anticipating the probable patterns and degrees of disproportionality likely to result from early screening. According to the most recent NAEP results for 4th grade reading, 63 percent of black students had scores that are below the basic level in reading. In contrast, 27 percent of white and 22 percent of Asian/Pacific Islander students scored at below the basic

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

level. The results of universal screening are likely to parallel those differences. Universal screening will be beneficial, however, if it identifies schools, classrooms, and individual teachers and children who need additional supports and provides effective interventions. Otherwise, the same problems with disproportionate representation in special education will accompany universal screening efforts. Furthermore, universal screening may uncover children with learning disabilities, particularly girls, who are presently underidentified.

Many of the children who are referred to special education exhibit reading problems, behavior problems, or both (Bussing et al., 1998). In both these areas, screening tools are available that would allow for early identification of children at risk for later problems, and existing intervention strategies hold promise for improving outcomes for those identified.

Early Screening and Intervention in Reading

There are a number of working models for screening all children in kindergarten, 1st, and 2nd grade for reading problems. Examples include the Observation Survey developed in New Zealand (Clay, 1993), the South Brunswick, New Jersey, Early Literacy Portfolio (Salinger and Chittenden, 1994), the Primary Language Record (Barr et al., 1988), the Work Sampling System (Meisels, 1996-1997), and the Phonological Awareness and Literacy Screening developed at the University of Virginia (see Foorman et al., 2001, for summaries of all of these programs). Most are attempts to engage teachers in collecting evidence on which to base curricular decisions about individual children. Some of these are more standardized, formal assessments that have attempted to address important psychometric issues such as test reliability and validity; others are more informal. Some have been implemented on a large-scale basis.

Perhaps the most fully researched and implemented model for universal screening is that currently being used in Texas. Beginning in 1998-1999, all school districts in Texas were required by law to administer an early diagnostic reading instrument for K-2. Although the specific assessment instrument was not mandated, the Texas Education Agency contracted for the development of the Texas Primary Reading Inventory (TPRI). This instrument (described in more detail in Box 8-1) was designed to be used on a large scale, to bring psychometric rigor to informal assessment, and to be aligned with state curriculum standards. By the 2000-2001 school year, over 90 percent of Texas’s 1,000 school districts had adopted the TPRI and its Spanish reconstruction, known as the Tejas Lee.

The TPRI consists of two parts, beginning with a screening instrument, which is administered to each child in grades K-2. Phonological awareness

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

BOX 8-1
The Texas Primary Reading Inventory

Carefully developed and revised through field trials and several years of use with thousands of students, the Texas Primary Reading Inventory (TPRI) is a two-part tool that helps teachers diagnose the kinds of reading problems students may be having and plan instruction accordingly. A screening test is first administered to all K-2 students by their teachers; this is followed by a more in-depth inventory for those students who do not show complete mastery of the questions on the screening test.

The concepts assessed by the screening test were selected because they were found to be good predictors of successful reading at the end of grades 1, 2, and 3. Screening is done at four keys points in time (i.e., middle and end of kindergarten, beginning of 1st grade, beginning of 2nd grade) with questions that focus on the critical reading skills that should be “developed” at that time. The TPRI screening helps teachers quickly identify those students who are on track to become successful readers one or two years later. The teacher can then administer the more time-consuming inventory only to those students who are potentially not on track—i.e., at risk for developing difficulties in learning to read. The inventory section provides information about the child’s strengths and weaknesses that can then be used by the teacher to plan reading instruction and monitor progress.

For example, midway through the year, a kindergarten teacher using the TPRI would individually administer the screening portion of the TPRI to each student in her class. She begins with a series of questions that assess the child’s letter-sound (or graphophonemic) knowledge—showing the child a letter and asking for its name and sound. Then she asks a set of questions focused on phonemic awareness. For example: “If the puppet says s-it, I know the word is sit. What would the word be if the puppet says cake?” If the child does not answer enough of these questions correctly, the teacher would proceed to administer the whole inventory portion of the TPRI.

The inventory portion of the TPRI consists of the following conceptual domains:

Book and print awareness (K only)—knowledge of the function of print and of the characteristics of books and other print materials (e.g., the child is asked to point out a sentence in text and show where it starts and ends).

Phonemic awareness (K and 1st grade)—the ability to detect and identify individual sounds within spoken words. Tasks include asking for rhyming words (tell me another word that rhymes with stop, shop, hop) or repeating words without the initial consonant sound (say the word “cake” without the “c”).

Graphophonemic knowledge (K, 1st, and 2nd grades)—recognition of letters of the alphabet and understanding of sound-symbol relationships. (e.g., for kindergarteners, questions like “What is the first sound in the word man?” for 2nd graders asking them to spell in writing words spoken by the teacher).

Reading accuracy (1st and 2nd grades)—the ability to read grade-appropriate text accurately (i.e., the child is asked to read a passage aloud and the teacher keeps track of the types of errors made by the child and scores the overall accuracy).

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

Reading fluency (1st and 2nd grades)— the ability to read connected text accurately, quickly, and automatically (e.g., the teacher times the reading of the passage above and calculates a fluency rate that includes only words read correctly).

Reading comprehension (1st and 2nd grades)—the understanding of what has been read (e.g., the teacher asks the child to answer implicit and explicit questions about the passage that the child has read aloud). For K-1 students unable to read aloud by themselves, listening comprehension is assessed, i.e., the ability to understand what has been read aloud (e.g., the teacher reads a short passage and asks the child both explicit and implicit questions about the events in the story).

According to the researchers who helped develop the test, “the most cost-effective early intervention is prevention—prevention in the form of differentiated classroom instruction” (Foorman and Schatschneider, in press). This means that teachers who use the TPRI to identify risk must also be able to translate the results of the assessment into instruction. To this end, an Intervention Activities Guide, provided to each teacher, has activities and sample lessons geared toward each of the major concepts assessed by the TPRI. Teachers can use it to plan supplementary lessons that focus on the specific skills in need of development. Developers of the test do note, however, that many teachers will need some professional development to help them learn to administer the test systematically and to use it to plan instruction effectively (Foorman et al., 2001).

The TPRI is notable for the attention paid to collecting empirical data about its psychometric properties. Items were selected for the screening test from a larger battery of items that were found to distinguish statistically between successful and unsuccessful readers at the ends of grades 1 and 2. In addition, field test data were collected to examine interrater reliability (the accuracy, agreement, and objectivity of scoring across teachers) as well as the validity of the TPRI scores compared with other well-known measures of word recognition and comprehension.

Cutoff points for the screening instrument have been purposely set low so that overidentification of those at risk occurs instead of underidentification (i.e., teachers err on the side of administering the complete inventory to some students who might not really be at risk rather than not administering it to some who truly are at risk). In this case, the main consequence of overidentification is that the teacher proceeds to administer the more comprehensive inventory to the child. Although the false-positive rate for the screening instrument is relatively large in kindergarten (38 percent) and 1st grade, it drops below 15 percent by the beginning of 2nd grade. Results of this test have been explicitly excluded by legislation from use in the Texas accountability system or its teacher appraisal and incentive system.

For more information on the TPRI, visit www.tpri.org or the web site of the Center for Academic and Reading Skills, developers of the instrument for the Texas Education Agency, at http://cars.uth.tmc.edu.

   

NOTE: This box describes TPRI as revised for 2001-2002.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

and letter-sound knowledge are the focus of this screening in kindergarten and the beginning of grade 1, while word reading is the focus of the screening at the end of grade 1 and the beginning of grade 2. If the screening test suggests that a child is still developing these key concepts, then a more comprehensive inventory is administered by the teacher to help identify each child’s strengths and weaknesses and to help target intervention strategies to use with each child. The scores on the TPRI are designed to provide a concrete demonstration of the knowledge and skills covered in the classroom curriculum. As expected, early identification through universal screening does yield a higher number of false positives (i.e., children who will be identified as at risk but will not end up experiencing difficulties learning to read). For example, about 38 percent of second-semester kindergartners are misidentified by the TPRI screen as needing further help. However, most of these students can get the support they need to be successful readers through supplemental small-group reading instruction from the teacher for about 20-30 minutes a day.

By the end of 2nd grade, if a child still does not meet the criterion of successful mastery on the TPRI, they are referred for further evaluation and intervention. Thus, use of the TPRI not only signals the need for more intensive intervention by 2nd grade, but it also holds promise for preventing reading difficulties by the use of ongoing assessment and targeted interventions while children are still learning to read in kindergarten and 1st grade.

Early Screening and Intervention for Behavior Problems

There now exist feasible and inexpensive tools to systematically assess the reading skills of all students. Currently there is no parallel emphasis on the systematic, continual tracking of emotional or behavioral problems, even though they commonly figure into reading and other learning problems (Bussing et al., 1998). Since identification and referral by teachers for emotional disturbance or behavior disorders is often unsystematic, idiosyncratic, and late in the development of a behavioral problem (see Chapter 6), early systematic screening could bring large improvements.

Existing identification procedures that rely on intrapersonal psychiatric assessments or standardized tests (e.g., Achenbach and Edelbrock, 1986) do reveal problems in emotional and behavioral adjustment. But they do not take into account possible problems in teacher practices or classroom or school-wide issues that may be critical in understanding the child’s problems and in formulating a corrective intervention strategy. This point is driven home by findings from a recent longitudinal study by Kellam et al. (1998a). On average, across 19 schools, 1st grade children who were assessed to be in the top quartile in aggression were four times more likely

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

than other students to demonstrate significant behavior problems in middle school. However, the subset of 1st grade children scoring in the top quartile who were in poorly managed 1st grade classrooms was over 50 times more likely to have conduct problems in middle school. This suggests a very powerful, independent influence of the teacher’s classroom management skills on child behavior outcomes.

Universal and repeated assessments of children and settings in schools have the potential to provide information on the behavioral adjustment of each child, both individually and in relation to other children in a class, school, or district. It also has the potential to provide systematic information on the effects of individual, classroom, or school-wide interventions. Several well-developed and validated assessment tools are available for the classroom settings. Instruments such as the Teacher Observation of Child Adjustment (TOCA; Werthamer-Larsson et al., 1991) and the Scale of Social Competence and School Adjustment (SSCSA; Walker and McConnell, 1995) are appropriate for all students. They provide specific and relevant information to enable teachers to assess the adjustment of every child in their classroom (see Box 8-2). Instruments such as the Child Behavior Checklist Report Form (Achenbach and Edelbrock, 1986), and the Systematic Screening for Behavior Disorders (SSBD) (Walker and Severson, 1992) provide scores and norms on several relevant behavioral and emotional dimensions. Although such instruments are used widely, they were designed and validated on clinical populations and provide useful information only at the extreme end of the continuum of disturbance (i.e., clinical cutoff scores). As such they are less than optimal universal assessment tools.

Direct observational tools that teachers can use with minimal training can be tailored to assess individualized behavioral and emotional adjustment (see Walker et al., 1995). Direct observational strategies are also available for noninstructional school settings (playgrounds, hallways, etc.) in which many behavioral and emotional problems are demonstrated.

As illustrated in the study by Kellam et al. (1998a), children scoring in the top quartile in aggression or conduct problems are at significant risk for subsequent behavior problems. For these children, a second-stage assessment that is individualized to take into account contextual factors should be considered. Such a multiple gating procedure, including three or more graduated assessments, is highly effective, allowing the integration of universal and clinical assessment strategies in a cost-sensitive way (Loeber et al., 1984; Walker and Severson, 1992).

The first level can be used to assess the adjustment, or progress, or response to new school-wide interventions of all students in a classroom, school, or district. The data can then be used to identify a smaller subset for further assessment to determine the appropriateness and then effectiveness

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

BOX 8-2
Universal Assessment

Behavioral Adjustment: Universal Assessment and Multiple Gating

There is a growing body of evidence that systematic teacher ratings of student behavior in the early grades is highly predictive of both short-term and long-term emotional adjustment (e.g., Kellam et al., 1998a). One instrument for such a systematic assessment is the Teacher Observation of Child Adjustment (TOCA). It is a relatively short and structured interview that can be conducted by a school psychologist or counselor, which systematically assesses a child’s adjustment in the classroom, particularly issues around aggressiveness/disruptiveness and shyness/ social isolation. Assessment of all children in a typical classroom can be conducted in under two hours, including time for a short discussion of teacher concerns about individual children. Teachers typically see the process as worthwhile, particularly if they are provided time within the school day to complete the process.

The TOCA yields quantitative scores and can be used to identify children with the most serious adjustment problems. Kellam et al. (1998a) found that 1st grade children in the top 15 percent in rated adjustment problems were at very high risk of serious discipline problems in middle school. One could use such a cutoff point to trigger a teacher consultation with the school psychologist and more intensive assessment to decide whether or not to institute an evidence-based, individualized program in the classroom (e.g., First Steps to Success; Walker et al., 1998).

This two-step assessment process—a universal assessment systematically triggering a more intensive assessment—is an example of “multiple gating.” If the teacher and students were really struggling and reported average scores in a classroom were much higher than in other classes in a given school, then an effective classroom-wide intervention (e.g., Webster-Stratton et al., 2001) might be considered to help the teacher more effectively deal with behavior and classroom management issues.

of individualized interventions. Examples of this type of program intervention for children whose screening suggests they are at risk of later behavior problems appear in Boxes 8-3, 8-4, and 8-5.

Direct observational tools that school psychologists, counselors, or teachers could use, given appropriate preservice or inservice training, can be tailored to assess behavioral dimensions, to further define and specify the targets and measure the effects of individualized interventions (see Walker et al., 1995; Horner, 1994). Both rating and direct observational procedures and associated interventions are available to conduct analogous assessments in key noninstructional settings that are less well structured and supervised than classrooms and in which student-to-student aggression

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

BOX 8-3
First Step to Success

The First Step to Success program targets at-risk kindergartners who show the soft, early signs of an antisocial pattern of behavior (e.g., aggression, oppositional-defiant behavior, severe tantrums, victimization of others). First Step to Success consists of three interconnected modules: (a) proactive, universal screening of all kindergartners; (b) school intervention involving the teacher, peers, and the target child; and (c) parent/caregiver training and involvement to support the child’s school adjustment. The major goal of the program is to divert at-risk kindergartners from an antisocial path in their subsequent school careers.

Multiple waitlist control studies (Golly et al., 1998; Walker et al., 1998) have documented the effects of First Step to Success. Effects include observed reductions in classroom problem behavior and increases in on-task behaviors. Teacher and parent ratings indicate:

decreased disruptive behavior (teacher report),

decreased withdrawn behaviors,

improved classroom atmosphere (assessed by independent observers), and

improved ratio of positive to negative interactions with the student.

and bullying often occur at very high levels (Olweus, 1991; Walker et al., 1995; Stoolmiller et al., 2000).

Interventions and Referral Decisions

It is the responsibility of teachers in the regular classroom to engage in multiple educational interventions and to note the effects of such interventions on a child experiencing academic failure before referring the child for special education assessment. It is the responsibility of school boards and administrators to ensure that needed alternative instructional resources are available (NRC, 1982:94).

Improved universal screening, prevention, and early intervention processes such as those described above should, in the committee’s view, be essential prerequisites to any consideration of student referral to special education. The current literature indicates, however, that some students do not respond to even the best early interventions in reading and other achievement areas (Torgesen, 2000; Wagner, 2000). The proportion of a general population that does not respond adequately is unknown because universal screening followed by early intervention procedures has not been applied broadly in any general population. Research with relatively small groups of students suggests that the nonresponse rate may be as high as 4 percent of

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

BOX 8-4
Incredible Years Series: Parent, Teacher, and Child Training

The Incredible Years Series is a set of three comprehensive, multifaceted, and developmentally based curricula for parents, teachers, and children designed to dovetail in order to promote emotional and social competence and to prevent, reduce, and treat behavior and emotional problems in young children (ages 2-8). In a report that emerged from the Division 12 Task Force on Effective Psychosocial Interventions (1995), the series was identified as one of two well-established treatments for conduct disorder (Brestan and Eyberg, 1998) and was selected by the Office for Juvenile Justice and Delinquency Prevention (OJJDP) as 1 of 11 model violence prevention programs (Webster-Stratton et al., 2001).

This series of programs addresses multiple risk factors across settings (school and home) known to be related to the development of conduct disorders in children. In all three training programs, trained facilitators use videotape scenes to structure the content, stimulate group discussion and problem solving, and promote the sharing of ideas among participants.

Incredible Years Training for Parents. The Incredible Years parenting series includes three types of parent programs. The Basic program emphasizes parenting skills known to promote children’s social competence and reduce behavior problems, such as: how to play with children, helping children learn, effective praise and use of incentives, effective limit-setting, and strategies to handle misbehavior. The Advance program emphasizes parental interpersonal skills, such as: effective communication skills, anger management, problem-solving between adults, and ways to give and get support. The Supporting Your Child’s Education (known as SCHOOL) emphasizes parenting approaches designed to promote children’s academic skills, such as: reading skills, parental involvement in setting up predictable homework routines, and building collaborative relationships with teachers.

Incredible Years Training for Teachers. This series emphasizes effective classroom management skills, such as: the effective use of teacher attention, praise, and encouragement, the use of incentives for difficult behavior problems, proactive teaching strategies, how to manage inappropriate classroom behaviors, and the importance of building positive relationships with students. In addition, a series of training videotapes are used to train teachers how to implement the Dinosaur Social Skills and Problem-Solving Curriculum as a prevention program in the classroom with all children. There is both a preschool/kindergarten and grade 1-2 version of this training.

Incredible Years Training for Children. The Dinosaur Child Curriculum emphasizes training children in such skills as emotional literacy, empathy or perspective taking, friendship skills, anger management, interpersonal problem solving, school rules, and how to be successful at school. It is designed for use as a pull-out treatment program for small groups of children exhibiting conduct problems or can be offered to the entire classroom in circle time discussions combined with small-group activities. There are 90 lessons designed to be offered twice a week over a period of 1 to 3 years.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

BOX 8-5
Linking the Interests of Families and Teachers

Listed as a promising program in the Surgeon General’s Report on Youth Violence (U.S. Department of Health and Human Services, 2001a) Linking the Interests of Families and Teachers (LIFT) is a universal school-based program that targets two major factors that put children at risk for subsequent behavior problems and delinquency: aggressive and other problem behaviors with teachers and peers at school and ineffective parenting, including inconsistent and inappropriate discipline and lax supervision. LIFT has 3 main components: (1) child social skills training, (2) a playground behavior game, and (3) parent management training.

Child social skills training in the program consists of 20 sessions of 1 hour each conducted across a 10-week period. Sessions are held during the regular school day. Each week, the sessions include five parts: (1) classroom instruction and discussion on specific social and problem-solving skills, (2) skills practice, (3) free play in the context of a group cooperation game, (4) a formal problem-solving session, and (5) review and presentation of daily rewards. The curriculum is similar for all elementary school students, but the delivery format, group exercises, and content emphasis are modified to address normative developmental issues depending on the grade level of participants.

The playground behavior game takes place during recess. During the game, rewards can be earned by individual children for the demonstration of both effective problem-solving skills and other positive behaviors with peers as well as the inhibition of negative behaviors. These rewards are then pooled with a small group of students as well as his or her entire class. When a sufficient number of armbands are earned by a group or by the class, simple rewards are given (e.g., an extra recess, a pizza party). The key to this aspect of the game is to have adults roaming throughout the playground, immediately terminating negative confrontations and handing out colorful nylon armbands as a reward to individual students for positive behavior towards peers. Playground monitors, required in most schools, can be taught to fill this role.

The parenting classes are conducted in groups of 10 to 15 parents and consist of 6 sessions scheduled once per week for approximately 2.5 hours each. The sessions are held during the same period of time as the child social skills training. Session content focuses on positive encouragement, discipline, monitoring, problem solving, and parental involvement in the school. Counselors, teachers, or psychologists can conduct the groups, as the curriculum is designed to accommodate varying levels of instructor education and expertise. Teachers and parents give the program extremely positive evaluations.

The surgeon general’s report documents evidence of the program’s effectiveness: “In short-term evaluations, LIFT decreased children’s physical aggression on the playground (particularly children rated by their teachers as most aggressive at the start of the study), increased children’s social skills, and decreased aversive behavior in mothers rated most aversive at baseline, relative to controls. Three years after participation in the program, 1st-grade participants had fewer increases in attention-deficit disorder-related behaviors (inattentiveness, impulsivity, and hyperactivity) than controls. At follow-up, 5th-grade participants had fewer associations with delinquent peers, were less likely to initiate patterned alcohol use, and were significantly less likely than controls to have been arrested.”

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

the general population (Wagner, 2000). The students who do not respond adequately to the very targeted, intensive interventions described above would then be eligible for an individualized education program (IEP), which would stipulate the ongoing supports required.

The proposed tiered intervention strategy is consistent with a broad consensus in the literature that high-quality interventions should be applied prior to consideration of special education eligibility and placement. Current special education rules and guidelines in the states nearly always require such prereferral interventions (although it may have another name) or school-based problem solving. Unfortunately, the quality of these interventions is often poor (Flugum and Reschly, 1994; Telzrow et al., 2000). For example, the vast majority lack critical features of effective interventions, such as: (a) behavioral definition of the problem; (b) development of a direct measure of the problem in the natural classroom or other setting that is of concern; (c) baseline data indicating the nature and severity of the problem; (d) analysis of the problem (task analysis with identification of prerequisite skills, analysis of environmental conditions, including instructional features); (e) development of an explicit, written intervention plan based on principles of instructional design and behavior change; (f) frequent checks on whether the plan is implemented as intended; (g) frequent progress monitoring with changes in the plan as needed; and (h) evaluation of results in terms of whether the gap is reduced sufficiently between peer and age-grade expectations (Tilly et al., 1999).

According to self-report information and examination of special education case files, approximately 80 to 90 percent of current prereferral interventions are missing three or more of these indices of quality (Flugum and Reschly, 1994; Telzrow et al., 2000). Studies indicate that 80 percent or more of the students receiving prereferral interventions as they are implemented today are also considered for special education eligibility. Poor quality is a major reason for the failure of prereferral interventions to resolve more problems in general education settings. Many of the prereferral interventions are guided by very popular models of “collaborative consultation” (e.g., Idol and West, 1987; West and Idol, 1987), which do not require data collection or several of the other critical features identified above (Fuchs and Fuchs, 1992; Tilly et al., 1999). Changing the quality of the interventions prior to the consideration of special education eligibility is crucial. Key special education and related services personnel (e.g., school psychologists) need substantial retraining and reorientation in order for this step in the special education services process to have its intended effect.

Children and youth who do not respond to high-quality interventions should be considered for special education, but only after high-quality interventions are provided. We reiterate that special education should not be considered unless there are effective general education programs, prefer-

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

ably supported by universal screening and early high-quality interventions prior to referral. Improving the interventions before considering special education placement is essential to implementing more effective general and special education programs.

Eligibility Decisions and System Reform

Eligibility decisions are markedly influenced by legal requirements, including conceptual definitions for disabilities (see Appendix 6-B) and classification criteria that are determined by the states. The conceptual definitions and classification criteria have an enormous impact on how professionals and the public think about disabilities; they determine rather directly the kind of assessment that is conducted during the full and individual evaluation, a mandated part of eligibility determination. If conceptual definitions and classification criteria use such concepts as general intellectual functioning or intellectual ability, it is nearly impossible to avoid the use of individually administered IQ tests and other measures of internal child traits or states. As noted previously, the information from measures of internal child traits have little application to interventions, are costly, and are objectionable to many constituencies.

Design alternatives that address some of the problems with the current special education system exist and have been implemented successfully (Ikeda et al., 1996; Reschly et al., 1999). Box 8-6 provides a brief description of the alternative approach used in the State of Iowa. Changes in the design and organization of the special education delivery system are consistent with current legal requirements, but they utilize quite different conceptions of disabilities and apply different assessment methods. The overall purpose of these systems is to improve outcomes through application of direct assessment methods and effective instruction and behavior change principles in a problem-solving framework.

Problem-Solving Approach

To be effective, the problem-solving approach for eligibility determination and the design of interventions in special education must be pervasive in the system, governing the behavior of professionals and others from the first indication of problems with learning or behavior through early intervention, prereferral interventions, eligibility determination, IEP development, annual review of progress, and triennial consideration of eligibility and programming.

There are several problem-solving models, all requiring systematic problem solving with data collection is essential (Upah and Tilly, 2002; Tilly et al., 1999). Problem solving should be a consistent set of activities involving

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

BOX 8-6
Special Education without IQ: The Case of Iowa

General intellectual functioning and IQ tests are used nearly universally as part of eligibility criteria and comprehensive evaluations for students suspected of having disabilities in educational settings. Many critics have pointed out limitations and flaws in IQ testing and in decision making highly influenced by IQ test results. Is IQ essential to special education?

Clearly, the answer is “no.” The Iowa reform plan that has been adopted in most of the state led to the complete abandonment of IQ testing. The Iowa reform was motivated by a commitment to improve the educational outcomes in special and general education programs. Educational leaders in Iowa focused on using the existing resources in general and special education more effectively and forging a close relationship between what special educators did in eligibility determination with educational programming.

Since 1995 the official State of Iowa Department of Education Rules of Special Education have permitted the adoption of a problem-solving approach to special education eligibility and programming that eliminates categorical eligibility and programming in the high-incidence disabilities. Instead of using IQ-achievement discrepancies and IQ cutoff scores, the Iowa Problem Solving Rules emphasize functional assessment that is related directly to the interventions that children and youth need. Moreover, traditional categorical labels for high-incidence disabilities are no longer used, leading to a focus on what children need and their degree of need rather than application of formulae for determining eligibility.

In the Iowa alternative model, traditional standardized IQ and achievement tests are replaced by direct measures of academic, behavioral, and emotional regulation in natural classroom and school settings. Local norms are used as the primary basis to determine degree of need for interventions. But special education eligibility is not based solely on degree of need. In addition, a problem-solving process is implemented to determine if the patterns of learning, behavior, or emotional regulation can be altered significantly in general education.

Rigorous criteria are established to guide the problem-solving process that requires a minimum of several weeks to implement properly. For example, the presenting problem must be defined in terms of observable behavior, a goal must be established that represents significant improvement, a direct measure of the behavior is developed and implemented, an intervention plan tailored to the problem is developed using experi

behavioral definitions of learning and behavior goals, collection of data in natural settings, application of research-based principles of learning and behavior, monitoring progress with changes in interventions as needed, and evaluation of outcomes. This framework is, in the committee’s view, the most promising approach currently available ensure the effectiveness of special and remedial education programs.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

mentally validated principles of instructional design and behavior change, implementation of the intervention is monitored to ensure that it is carried out as intended, and the student’s progress is monitored frequently (often, twice or more per week). Improvements in the intervention are implemented if the results are falling short of goals and the overall effects are evaluated.

Special education eligibility may be considered after the results of one or more high-quality interventions are implemented and evaluated. If the student’s progress has improved significantly, special education is not likely to be considered further. If the intervention is not sufficient to bring the student into a broadly defined range of normal achievement, behavior, or emotional regulation, special education need is considered. Special education need is evaluated according to judgments of whether the specially designed instruction with necessary supports and services are likely to address the problem effectively.

In Iowa, students are simply designated as eligible or not eligible for special education services. The eligibility criteria for the high-incidence disabilities are: (a) a large difference from average levels of achievement, behavior, or emotional regulation that interferes significantly with school performance, (b) insufficient response to high-quality, rigorous interventions, and (c) demonstrated need for special education.

No IQ tests are used; there are no eligibility criteria specifying the need for an assessment of intellectual functioning or ability. Standardized tests of achievement and behavior rating scales are used sparingly. Direct measures in the natural setting, such as curriculum-based measurement in academic skills domains and behavior observation and interview, are used instead, with local norms used to decide degree of need. That is, students are compared with peers in the same classroom, school, and district to determine degree of need.

Special education is changed. Resources are redirected from expensive eligibility evaluations to the development of high-quality interventions in general and special education. Moreover, greater emphasis on early intervention and prevention is possible because the focus is on delivering effective programs, not on waiting until students fail badly enough to qualify for special education.

Finally, the Iowa reform has not resulted in greater numbers or proportions of students placed in special education. It has changed how special education is done in order to improve outcomes for children and youth. For more information see Ikeda et al. (1996) and Reschly et al. (1999).

Assessment

Application of assessment measures that provide the foundation for problem-solving interventions was recognized as crucial in the 1982 NRC report: “It is the responsibility of assessment specialists to demonstrate that the measures employed validly assess the functional needs of the individual child for which there are potentially effective interventions” (p. 94). The report noted that much of the data collected then within the context of

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

special education had little or no relationship to interventions. The main change over the past two decades is the development of a much richer and relevant knowledge base to provide the kind of assessment that is related to interventions to achieve the goal. That is, problem-solving approaches, assessment methods, and techniques are determined by what is needed in each of the problem-solving steps. The measures used must reflect the problem in the natural setting in which it occurs (e.g., number of words read correctly, number of disruptive events that interfere with the child’s learning as well as the learning of other children) and be conducive to frequent assessment of progress. The kinds of measures that meet these criteria typically come from a behavioral assessment tradition (e.g., Gresham, 1999; Gresham and Noell, 1999; Mash and Terdal, 1998; Shapiro and Kratochwill, 2000; Shinn, 1998). The measures are direct reflections of the problem behavior, applied in the natural setting typically as part of the ongoing classroom routine, through observation in natural settings, or in very brief interactions with children.

Using direct measures, “problems” are defined typically as large differences between the performance of individual or small groups of children and that of other children in the same environment. For example, disruptive behaviors (further defined into specific behaviors that are observable, such as number of inappropriate verbalizations or number of physically aggressive behaviors) are observed in a classroom, focusing on a specific child or a small group of children as well as other children. A “problem” exists when the disruptive behavior of one child or a small group of children is substantially different from others in the same environment in a domain of achievement or behavior that is developmentally important. For referred children these differences typically are large.

Classification Decisions

Before considering alternatives to the traditional classification system, it is important to consider goals for a classification system. The NRC report (1982:94) established important criteria for a child disability classification scheme:

It is the responsibility of the placement team that labels and places a child in a special program to demonstrate that any differential label used is related to a distinctive prescription for educational practices and that these practices are likely to lead to improved outcomes not achievable in the regular classroom....

It is the responsibility of the special education and evaluation staff to demonstrate systematically that high-quality, effective special instruction is being provided and that the goals of the special education program could not be achieved as effectively within the regular classroom.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

Among the most important goals for a classification scheme are those of reliability and validity (Cromwell et al., 1975). Current disability classifications for special education have dubious reliability (see previous discussion of LD) and undocumented validity with regard to the design, implementation, and evaluation of treatments. The measures used in traditional classification schemes are not directly related to treatment, meaning that valuable time and resources are lost that could be used if direct measures of performance were used more widely.

As noted previously, classification decisions are strongly influenced by federal and state legal requirements. The direct measures described above are useful in classification decisions regardless of the classification scheme used, traditional or noncategorical. The direct measures focus on relevant domains of behavior in natural settings—specifically, achievement and school social behaviors. These behaviors are directly linked to general schooling goals and specifically to state accountability programs.

Two alternatives exist for the development of a classification scheme that focuses on direct measures of child performance: one is noncategorical designation of students as eligible for special education, and the other is changing the definitions and classification criteria for traditional disability categories. The first is preferable in the committee’s view, but it requires the greater amount of change. As noted previously, categorical designation of students as eligible for special education is not required by federal law (see 34 CFR 300.125); noncategorical designation is legal at the federal level. The states vary significantly in their requirements regarding the categorical designation of students as eligible. Some states in full compliance with IDEA (1997, 1999) do not use categorical disability schemes for special education (e.g., Iowa).

The committee’s support for noncategorical designations was arrived at in recognition of what has occurred in the public schools over the past four decades. The challenges leveled at the labeling and educational treatment of children with mild mental retardation (usually referred to as “educable mentally retarded” [EMR]) in the late 1960s and 1970s was highlighted in the 1982 report (NRC, 1982) and is significant to the understanding of what has occurred. It is our contention that the assessment process at that time was a “high-stakes enterprise” in the sense that the psychometric profile of the child had consequences for: (a) the label that was appended to the child, and as a result, (b) the curriculum and/or services, along with (c) the administrative arrangement or placement of the child. Recall that this predates passage of P.L. 94-142 and the applications of “free appropriate public education,” “least restrictive environment,” or “individualized education plans” (IEP).

Classification as EMR dictated, in turn, in what kind of administrative placement the child would receive services. To quote Robinson and

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

Robinson (1965) in describing education services for EMR students in that era, “The consensus of special educators today definitely favors special class placement for the mildly retarded” (p. 466). Essentially, diagnosis as EMR carried with it a “packaged program”—and the package almost inevitably was an alternative, functional curriculum that differed markedly from the curriculum taught in general education. In fact, the position taken by special educators then was that for children with mental retardation, unlike virtually every other disability, special education services modified not only how children were taught, but also what they were taught. The various EMR curricula” (e.g., Hungerford’s New York program, the Cin-cinnati curriculum) shared an emphasis on promoting prevocational and later vocational skills, social and interpersonal skills, and functional academics. Hence, diagnosis as EMR in the 1960s resulted in a child’s being taught a “different” curriculum, which would subsequently be faulted by critics who observed that it made return to the general education population difficult, if not impossible, and made the assumption that all EMR children should receive the same curriculum. In addition, that curriculum was almost invariably taught in a self-contained special class, or special day class. Hence, diagnosis as EMR carried with it placement consequences— i.e., placement in a special day class.

In a similar fashion, diagnosis as LD had program and placement consequences as well. Typically, children diagnosed as LD continued to receive the general education curriculum and services were designed to assist the child with processing problems by pulling them out of a regular class to a resource room for remedial assistance from the resource teacher. Hence, the differences between being diagnosed as EMR and LD were several. One diagnosis conveyed the belief that the general curriculum was appropriate (i.e., LD), while the other diagnosis (EMR) was predicated on the belief that an alternative curriculum was needed. Placement consequences were also noted, as LD students were typically served in a resource room pull-out program.

When one examines the consequences today of diagnosing a child as EMR or LD, it is a very different situation than existed prior to P.L. 94-142. At present, a child must be qualified as eligible for special education and related services by meeting one of the existing disability categories. However, no longer does categorical eligibility carry with it either curricular or placement consequences. Instead, IDEA requires that once a child is deemed eligible for special education by qualifying for a disability category, the IEP process will be the means by which the “appropriate” portion of the free, appropriate public education is negotiated. In the IEP process, short-term and long-term goals are denoted and the supports and services needed to accomplish those goals specified. Hence, program or placement is nego-

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

tiated during the IEP process. Being found eligible by virtue of qualifying as LD entitles a child only to have an IEP drawn up, but it carries with it no particular programmatic or placement consequences. As a result, diagnosis into one of the disability categories is no longer a high-stakes venture.

As explained in the discussion of referral, public school personnel in many, if not most, states are reserving the label “mental retardation” for only patently disabled children (Gottlieb et al., 1994; MacMillan et al., 1996c) and are knowingly labeling low aptitude (i.e., those with IQ scores below the cutoff score for mental retardation) as learning disabled (MacMillan et al., 1998b). The rationale for doing so is that there is no advantage to labeling able-bodied children “mentally retarded” when an appropriate curriculum and placement can be designated in the IEP process in which the least restrictive environment is specifically considered.

Rational classification criteria have been developed to guide eligibility decisions for special education without using categories or traditional measures (Tilly et al., 1999). These schemes apply all of the due process requirements associated with IDEA as well as establish strong parental involvement programs. The two crucial features of these eligibility criteria are: (a) documented large differences in performance in relevant domains of behavior using peers as a comparison group and (b) documented insufficient response to well-designed, appropriately implemented interventions in general education. The student can then be designated as “eligible for special education,” assuming that all of the due process protections are implemented. This approach finds the “right” kids—that is, those who need additional supports in order to achieve—is legally defensible in due process hearings, and is politically acceptable in that it does not lead to excessive numbers of students qualifying for special education (Ikeda et al., 1996; Reschly et al., 1999). The “hit rate” using the less than perfect traditional system as a criterion is very high (Wilson et al., 1992).

A second and less desirable alternative to the current classification system is to redefine the criteria for the high-incidence disabilities of LD, MR, and ED. Changing the classification criteria for LD and ED is feasible; however, changes in MR are less feasible due to the perspective of several centuries that it involves very low intellectual ability. LD, MR, and ED could, however, be defined in terms of functional deficits in relevant domains using direct measures of academic skills and social behaviors. The changes in LD, MR, and ED classification criteria have the advantage of eliminating assessment procedures having little relevance to treatment, but also the large disadvantage of being associated with ideas of internal child deficits that are difficult if not impossible to change. Moreover, the negative connotations of traditional categories, especially MR, would not be avoided to the same extent as is possible with a noncategorical system.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.
Accountability

It is clear that the original framers of Education of All Handicapped Act (1975) were concerned with making sure that special education programs were effective. The various procedural requirements, such as due process, IEP development, full and individual evaluation, annual review, and triennial reevaluation, were all designed to ensure that the services would be effective. The framers established “process or procedural” protections to ensure accountability.

Although a great deal has been accomplished with the procedural requirements, accountability for results was not achieved. IDEA (1997, 1999) placed more emphasis on accountability and moved special education for students with disabilities into the mainstream of educational reform. The system now demands accountability without adjustments in the classification practices and assessment requirements to make accountability feasible.

Research on the effectiveness of special education overwhelming supports changes away from IQ-based disability determination to functional assessment and problem-solving interventions. One aspect of problem solving is particularly important: formative evaluation. Formative evaluation methods involve establishing goals, gathering baseline data to reflect current performance, instruction or behavioral interventions, with monitoring of progress frequently (daily, twice per week), and with changes made in interventions depending on the ongoing results of that intervention. If goals are met, typically the goal is raised to ensure that the student always has a challenging but achievable goal to guide and motivate efforts. If goals are not met, instructional and behavior change interventions are analyzed further and changed to foster better outcomes and efforts to improve instruction are implemented (Fuchs and Fuchs, 1986; Kavale and Forness, 1999). Interventions guided by this kind of problem solving are more effective by 0.75 to 1.0 SD over typical special education interventions.

Gifted and Talented Identification

It is far more difficult to make a case for early identification and intervention for gifted and talented students, because no research base currently provides guidance in this regard. There has been an absence of public support for gifted programs for the very young, resulting in few opportunities to conduct research on program features that promote achievement at the highest end of the distribution. This is perhaps not surprising given the well-known problems of reliability of traditional instruments for assessing intellectual function in young children. “Readiness tests” used as screening instruments for intellectual competence and traditional tests of intelligence and aptitude have been soundly criticized for their inappropriateness for

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

young children generally, and with minority children in particular (Meisels, 1987; Anastasi, 1988, Gandara, 2000). Thus, while many of the predictors of academic failure are well established even for the very young, there is currently no consensus regarding predictors of giftedness.

For elementary and secondary students, limited programs of identification and services for gifted and talented students have been carried out under the auspices of the Jacob K. Javits Gifted and Talented Students Education Program. But the collection of data in the framework of any systematic research paradigm has been limited. Yet the importance of early identification and opportunity to learn is likely to be as critical to the success of students at the upper end of the achievement distribution as it is for those at the lower end. And the problem of disentangling the child’s abilities from the previous opportunities to learn strikes a clear parallel. Nevertheless, the existing research base provides too weak a foundation for proposing an alternative assessment approach similar to that proposed for special education.

CONCLUSIONS AND RECOMMENDATIONS

Assessment in special education is guided by complex legal requirements that are responsible in part for the gap between current practices and the state of the art. Direct measures of skills in natural settings, along with the application of problem-solving methodologies, have the promise of significantly improving the outcomes for students in special education and for those considered for but not placed in special education. Traditional disability conceptions and classification criteria interfere with the implementation of systematic problem solving, functional assessment, formative evaluation, and accountability for outcomes. The system changes discussed here and in the recommendations were anticipated in the 1982 National Research Council report. Over the last two decades, significant system changes have become more feasible due to advances in assessment and intervention knowledge. It now is time to implement these changes more widely as a means to protect all children from inappropriate classification and placement, as well as from ineffective special education programs.

The proposed change would focus attention away from efforts to uncover unobservable child traits ,the identification of which gives little insight into instructional response, and toward the problems encountered in the classroom and appropriate responses. The role of instruction and classroom management in student performance is explicitly acknowledged, and effort is devoted first to ensuring the opportunity to succeed in general education.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.
Federal Level Changes

Recommendation SE.1: The committee recommends that federal guidelines for special education eligibility be changed in order to encourage better integrated general and special education services. We propose that eligibility ensue when a student exhibits large differences from typical levels of performance in one or more domain(s) and with evidence of insufficient response to high-quality interventions in the relevant domain(s) of functioning in school settings. These domains include achievement (e.g., reading, writing, mathematics), social behavior, and emotional regulation. As is currently the case, eligibility determination would also require a judgement by a multidisciplinary team, including parents, that special education is needed.

We provide more detail regarding our intended meaning below:

Eligibility
  • The proposed approach would not negate the eligibility of any student who arrives at school with a disability determination, or who has a severe disability, from being served as they are currently. Our concern here is only with the categories of disability that are defined in the school context in response to student achievement and behavior problems.

  • While eligibility for special education would by law continue to depend on establishment of a disability, in the committee’s view noncategorical conceptions and classification criteria that focus on matching a student’s specific needs to an intervention strategy would obviate the need for the traditional high-incidence disability labels such as LD and ED. If traditional disability definitions are used, they would need to be revised to focus on behaviors directly related to classroom and school learning and behavior (e.g., reading failure, math failure, persistent inattention and disorganization).

Assessment
  • By high-quality interventions we mean evidence-based treatments that are implemented properly over a sufficient period to allow for significant gains, with frequent progress monitoring and intervention revisions based on data. Research-based features of intervention quality are known and must be implemented rigorously including:

    1. an explicit definition of the target behavior in observable, behavioral language;

    2. collection of data on current performance;

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.
  1. establishment of goals that define an acceptable level of performance;

  2. development and implementation of an instructional or behavioral intervention that is generally effective according to research results;

  3. assessment and monitoring of the implementation of the intervention to ensure that it is being delivered as designed, frequent data collection to monitor the effects of the intervention, revisions of the intervention depending on progress toward goals, and evaluation of intervention outcomes through comparison of postintervention competencies with baseline data.

Several sources detail these procedures (Flugum and Reschly, 1994; Reschly et al., 1999; Shinn, 1998; Upah and Tilly, 2002).

  • Assessment for special education eligibility would be focused on the information gathered that documents educationally relevant differences from typical levels of performance and is relevant to the design, monitoring, and evaluation of treatments. Competencies would be assessed in natural classroom settings, preferably on multiple occasions.

  • While an IQ test may provide supplemental information, no IQ test would be required, and results of an IQ test would not be a primary criterion on which eligibility rests. Because of the irreducible importance of context in the recognition and nurturance of achievement, the committee regards the effort to assess students’ decontextualized potential or ability as inappropriate and scientifically invalid.

Reporting and Monitoring
  • Current federal requirements regarding reporting by states of the overall numbers of students served as disabled and the program placements used to provide an appropriate education would not change with these recommendations. Moreover, the reporting of the nine low-incidence disabilities would continue to be done by category. Reporting of the numbers of students currently diagnosed with high-incidence disabilities would become noncategorical, with the loss of very little useful information due to the enormous variations in the operational definition of the high-incidence categories used currently. The reporting by states concerning students now classified in high-incidence categories could be made more meaningful if the reporting also included the nature of the learning or behavioral problem as reflected in the top 2-4 IEP goals for each student, that is, the number of students with IEP goals in basic reading, reading comprehension, math calculation, self-help skills, social skills, math reasoning, etc. The latter information would provide more accurate information on the actual needs

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

of students with disabilities than the current information indicating unreliable categorical diagnoses.

  • Consistent with IDEA 1997 and 1999, federal compliance monitoring should move in the direction of examining the quality of special education interventions and the outcomes for students with disabilities. Current compliance monitoring focuses on important, but limited, characteristics of the delivery of special education programs, particularly implementation of the due process procedural safeguards and the mandated components of the IEP. Compliance monitoring by the Federal Office of Special Education Programs and the state departments of education must assume an outcomes focus in addition to the traditional process considerations.

State-Level Changes

State regulatory changes would be required for implementation of a reformed special education program that uses functional assessment measures to promote positive outcomes for students with disabilities. Some states have already instituted changes that move in this direction. In Iowa, noncategorical special education for students with high-incidence disabilities has been implemented since the early 1990s. Several other states have approved “rule replacement” programs that allow school districts to implement special education systems that do not require categorical designation of students with high-incidence disabilities (e.g., Illinois, Kansas, South Carolina). These state rules require a systematic problem-solving process that is centered around quality indicators associated with successful interventions (see previous section). The rules are explicit about each of these quality indicators, and compliance monitoring is focused on their implementation. Several features of rules in the majority of states can be omitted in a noncategorical system, including the requirements regarding IQ testing.

The changes in federal regulations and state rules toward greater emphasis on producing positive outcomes and away from an eligibility determination process that is largely unrelated to interventions are consistent with the greater emphasis in IDEA (1997, 1999) on positive outcomes for students with disabilities. Positive outcomes are enhanced by the implementation of high-quality interventions; no such claim can be made for conducting the assessments required to assign students with significant learning and behavior problems to the high-incidence categories of LD, ED, and MMR.

Early Screening

Universal screening of young children for prerequisities to and the early development of academic and behavioral skills is increasingly recognized as

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

crucial to achieving better outcomes in schools and preventing achievement and behavior problems. While this is true for all children, a disproportionate number of disadvantaged children are on a developmental trajectory that is flatter than their more advantaged counterparts. Evidence suggests that effective and reliable screening of young children by age 4 to 6 can identify those most at risk for later achievement and behavior problems, including those most likely to be referred to special education programs.

In two arenas—reading and behavior—the knowledge base exists to screen and intervene in general education both systematically and early. Less attention has been devoted to early identification and intervention for mathematics problems. However, the NICHD has launched a research program in this area. Other efforts to develop early screening mechanisms in mathematics have been developed, but their psychometric properties have not yet been widely tested (Ginsberg and Baroody, 2002; Griffin and Case, 1997).

While early reading is only one of the areas in which students struggle, it is an important one because failure in early reading makes learning in the many subject areas that require reading more difficult. Moreover, there is a great deal of comorbidity between reading problems and other difficulties (attentional, behavioral) that results in special education referral.

As indicated above, early screening and intervention would help to identify children who may be missed in a wait-to-fail model. It may obviate the need for placement in special education for some children, and it would provide the evidence of response or lack of response to high-quality instruction that we proposed be written into federal regulations.

Recommendation SE.2: The committee recommends that states adopt a universal screening and multitiered intervention strategy in general education to enable early identification and intervention with children at risk for reading problems.

The committee’s model for prereferral reading intervention is as follows:

  • All children should be screened early (late kindergarten or early 1st grade) and then monitored through 2nd grade on indicators that predict later reading failure.

  • Those students identified through screening as at risk for reading problems should be provided with supplemental small-group reading instruction by the classroom teacher for about 20-30 minutes per day, and progress should be closely monitored.

  • For those students who continue to display reading difficulties and for whom supplemental small-group instruction is not associated with improved outcomes, more intensive instruction should be provided by other

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

support personnel, such as the special education teacher and/or reading support teacher in school.

  • For students who continue to have difficulty, referral to special education and the development of an IEP would follow. The data regarding student response to intervention would be used for eligibility determination.

  • State guidelines should direct that the screening process be undertaken early, and the instructional response follow in a very timely fashion. The requirement for general education interventions should not be used to delay attention to a student in need of specialized services.

The committee’s recommendation to adopt a universal screening and multitiered intervention strategy is meant to acknowledge that there is some distance to travel between the knowledge base that has been accumulated and the capacity to use that knowledge on a widespread basis. There are early examples in Texas and Virginia of taking screening to scale. But making the tools available to teachers, preparing teachers both to assess students and to respond productively to the assessment results, and supporting teachers to work with the instructional demands of intervening differently for subgroups of students at different skill levels require the careful development of capacity and infrastructure.

At the same time that the committee acknowledges the investment required to adopt this recommendation, we call attention to the potential return on the investment and the consequences of not making such an investment. When early screening and intervention is not undertaken, more students suffer failure. The demands on the school to invest in a support structure for those students is simply postponed to a later age, when the response to intervention is less promising and when the capacity of teachers to intervene effectively is made even more difficult by a weaker knowledge base and limited teacher skill. The consequences of school failure for the student and for society go well beyond the cost to the school, of course.

Behavior Management

Current understanding of early reading problems is the outcome of a sustained research and development effort that has not been undertaken on a similar scale with respect to other learning and behavior problems. In the committee’s view, however, there is enough evidence regarding universal behavior management interventions, behavior screening, and techniques to work with children at risk for behavior problems to better prevent later serious behavior problems. Research results suggest that these interventions can work. However a large-scale pilot project would provide a firmer foundation of knowledge regarding scaling up the practices involved.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

Recommendation SE.3: The committee recommends that states launch large-scale pilot programs in conjunction with universities or research centers to test the plausibility and productivity of universal behavior management interventions, early behavior screening, and techniques to work with children at risk for behavior problems.

We propose a model for experimentation similar to that proposed for reading:

  • Assessment of the classroom and of noninstructional school settings (hallways, playgrounds) should be made yearly.

  • Behavioral adjustment of all children in grades K-3 should be screened yearly to provide teachers with information regarding individual children. The assessments should be reviewed yearly by a school-level committee (comprised of administrative and teaching staff, specialists, and parents) to ensure that school-wide interventions are implemented when indicated in a timely fashion and to ensure that individual children are given special services quickly when needed.

  • Because characteristics of the classroom and school can increase risk for serious emotional problems, the first step in the determination of an emotional or behavioral disability is the assessment of the classroom and school-wide context. Key contextual factors should be assessed and ruled out as explanations before intervention at the individual child level is considered.

  • If it is determined that contextual factors are not significantly involved in the child’s problem, then individualized measures should be taken to help the child adjust in the standard classroom/school setting. Only those interventions with empirical evidence supporting their effectiveness should be considered. For example, common features of emotional and behavioral problems are off-task and disruptive behaviors. Well documented interventions with demonstrated effectiveness at reducing these behaviors should be employed before the child is considered disabled.

  • Because the most serious and developmentally predictive emotional and behavioral problems in children tend to be manifested across settings, and because family issues and solutions tend to overlap with those at school, every effort should be made to include parents and guardians as partners in the educational effort. To the extent that this is done, early and accurate identification of serious problems should be facilitated, and parents can be enlisted to collaborate with teachers in both standard education and in solving emerging academic, emotional, and behavioral problems.

  • For children who do not respond to standard interventions, the intensity of the interventions should be increased through the use of behavioral consultants, more intensive collaborations with parents, or through

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

adjunct interventions to address various skill or emotional deficits (e.g., anger control, social skills instruction). Such individualized programs should be carefully articulated through the use of IEPs, coupled with systematic assessments of the child’s behavioral response to the interventions.

Teacher Quality

To support the proposed changes, school psychologists and special education teachers would need preparation that is different in some respects from that now required.

Recommendation TQ.3: A credential as a school psychologist or special education teacher should require instruction in classroom observation/assessment and in teacher support to work with a struggling student or with a gifted student. These skills should be considered as critical to their professional role as the administration and interpretation of tests are now considered.

  • Instruction should prepare the professional to provide regular behavioral assessment and support for teachers who need assistance to understand and work effectively with a broad range of student behavior and achievement.

  • Recognizing and working with implicit and explicit racial stereotypes should be incorporated.

The proposed reform of special education that would focus on response to intervention in general education would require substantial changes in the current relationship between general and special education. It would put in place a universal prevention element that does not now exist on a widespread basis with the purpose of: (a) providing assistance to children who may now be missed and (b) obviating the need for the special education referrals that can be remedied by early high-quality intervention in the general education context. In the final analysis, the committee cannot predict the effect of this approach on the number of special education students nor on racial/ethnic disproportion, but the result, in our judgment, would be that children identified for special education services would be those truly in need of ongoing support. And if the effect of the classroom context and opportunity to learn is successfully disentangled from the student’s need for additional supports, in our view that disproportion in identification would not be as problematic as it is currently.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.
Federal Support of State Reform Efforts

Recommendation SE.4: While the United States has a strong tradition of state control of education, the committee recommends that the federal government support widespread adoption of early screening and intervention in the states.

In particular:

  • Technical assistance and information dissemination should be coordinated at the federal level. This might be done through the Department of Education, the NICHD, a cooperative effort of the two, or through some other designated agent. Accumulation and dissemination of information and research findings has “public good” properties and economies of scale that make a federal effort more efficient than many state efforts.

  • The federal government can encourage the use of Title I funds to implement early screening and intervention in both reading and behavior for schools currently receiving those funds. Funds provided in the Reading Excellence Act might also support this effort under the existing mandate.

Gifted and Talented Eligibility

The research base justifying alternative approaches for the screening, identification, and placement of gifted children is neither as extensive nor as informative as that for special education. While limited programs of identification and services for gifted students have been carried out under the auspices of the Jacob K. Javits Gifted and Talented Students Education Program, the collection of data in the framework of any systematic research paradigm has been limited. Yet the importance of early opportunity to learn is likely to be as important for the success of students at the upper end of the achievement distribution as it is for those at the lower end. And the problem of disentangling the children’s abilities from their previous opportunities to learn strikes a clear parallel. Nevertheless, the existing research base restricts our understanding and therefore our recommendations: rather than proposing a specific approach to screening or identification for gifted and talented students, we propose research that may allow for better informed decision making in the future.

Recommendation GT.1: The committee recommends a research program oriented toward the development of a broader knowledge base on early identification and intervention with children who exhibit advanced performance in the verbal or quantitative realm, or who exhibit other advanced abilities.

Suggested Citation: "8 Alternative Approaches to Assessment." National Research Council. 2002. Minority Students in Special and Gifted Education. Washington, DC: The National Academies Press. doi: 10.17226/10128.

This research program should be designed to determine whether there are reliable and valid indicators of current exceptional performance in language, mathematical, or other domains, or indicators of later exceptional performance. To the extent that the assessments described above provide information relevant to the identification of gifted students, they should be used for that purpose.

In addition to research to support the development of identification instruments, research on classroom practice designed to encourage the early and continued development of gifted behaviors in underrepresented populations should be undertaken so that screening can be followed by effective intervention. That research should be designed to identify:

  • Opportunities that can be provided during the kindergarten year to engage children in high-interest learning activities that allow development of complex, advanced reasoning, accelerated learning pace, and advanced content and skill learning capabilities.

  • Interventions in later school years with children who demonstrate advanced learning capabilities and their impact on the performance of these children over time.

  • The effect of curricular differentiation through various options, such as resource room instruction, independent study, and acceleration, and the interaction of treatments with individual student profiles. Group size, instructional method, and complexity of the curriculum should all be variables under study.

An enriched curriculum designed for gifted students may well improve educational outcomes for all children. As mentioned in Chapter 5, when class size was reduced in 15 schools in Austin, Texas, the two that showed improved student achievement were schools that made other changes as well, including making the curriculum for gifted students in reading and mathematics available to all students (NRC, 1999a). This does not imply, however, that the pace of instruction or the level of student independence is necessarily the same for all students. We recommend that research be conducted using control groups to determine the impact of interventions designed for children identified as gifted on children who have not been so identified.

Next Chapter: 9 Weighing the Benefits of Placement
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.