In this chapter we discuss the influence of conceptual definitions, classification criteria, and assessment practices on the four educational classifications of concern here: mental retardation, emotional disturbance, specific learning disability, and giftedness. We begin with the disability classifications.
Conceptual definitions and classification criteria have enormous influence on the assessment procedures applied during the determination of eligibility and special education needs. Although general assessment requirements are applicable to all disabilities (see Appendix 6-A), there are also specific requirements for each of the disability classifications considered in this chapter.
Learning disabilities (LD) are a group of disorders that involve more than half the children in special education programs. LD prevalence has risen rapidly over the past 25 years. Disproportionate minority representation in LD occurs for Asian/Pacific Islanders, who are underrepresented by 2.7 times the rate of white students and for American Indian/Alaskan Natives, who are overrepresented by a factor of 1.2 times the white rate. All other groups are represented at or very close to the rate for white students.
The notion of learning disabilities and the attendant terminology arose in the mid-1960s when a psychologist, Samuel Kirk, first used the term “learning disability.” Kirk used the term as a catchall phrase to describe a number of different problems affecting the ability of certain children to learn. He noted that these problems manifested themselves in children who were otherwise capable, but were underachieving. There was a variance between the child’s level of achievement and the child’s presumed capabilities. Kirk defined learning disabilities as “a retardation, disorder, or delayed development in one or more of the processes of speech, language, reading, spelling, writing, or arithmetic resulting from a possible cerebral dysfunction and not from mental retardation, sensory deprivation, or cultural or instructional factors” (Kirk, 1962:263).
This was a new concept, even though unexpected underachievement in otherwise capable children had been reported much earlier in association with dyslexia, word blindness, dysgraphia, and dyscalculia (Doris, 1993; Hallgren, 1950; Hinshelwood, 1917; Orton, 1925; Strauss and Werner, 1942). Parents, educators, and policy makers embraced the new term “learning disabilities” because it fulfilled a need to provide special education services to children whose failure to learn could not be explained by mental retardation, visual impairments, hearing impairments, or emotional disturbance. The new term represented a new category for describing children with learning impairments that were not attributable to obvious physical, emotional, or psychological shortcomings. There was no stigma attached because “Their difficulties in learning to read, write, and/or calculate occurred despite adequate intelligence, sensory integrity, healthy emotional development, and cultural and environmental advantage” (Lyon et al., 2001).
Prior to Kirk’s revelation, children with learning disabilities were simply not being served. The new concept catalyzed parents and educators to act. In 1969 these children were eligible for services with passage of the Learning Disabilities Act. Eligibility continued in the Education of the Handicapped Act (EHA) (1975, 1977) and in the Individuals with Disabilities Education Act (IDEA) (1997, 1999).
Additional rules were formulated in 1977 specifically for the LD category (34 CFR 300.541). These rules were a compromise that no one particularly liked or supported at the time, but they have survived as an apparently objective method used to solve a difficult problem: determining which students among those with achievement problems should be eligible
for special education services in the category of LD. The objectivity and appropriateness of the method suggested in the 1977 regulations have been questioned over the past 15 years.
The 1977 federal regulations established classification criteria that were not entirely consistent with the LD conceptual definition (see Appendix 6-B), which implied an underlying cognitive processing disorder as the core feature of the disability. The classification criteria had three broad components (see Regulation 540 in Appendix 6-A). The first was low achievement in one of seven areas. The second was “a severe discrepancy between achievement and intellectual ability” in one or more of the seven achievement areas. The third involved what are known as the exclusion criteria: LD could not be the result of inappropriate educational programming; visual, hearing, or motor impairment; mental retardation; emotional disturbance; or environmental, cultural, or economic disadvantage. These criteria could be summarized as defining LD as unexpected low achievement that cannot be explained by low ability, absence of an opportunity to learn, or other factors.
State requirements for determining eligibility generally apply the discrepancy and exclusion factors, although there are substantial variations. The most recent national survey of state criteria (Mercer et al., 1996) indicated that 94 percent of states mentioned a processing disorder in the conceptual definition, but processing factors were included in only 33 percent of the states’ classification criteria. Virtually all states applied the exclusion factors (98 percent), and all included the achievement areas of reading, writing, and math. Dissatisfaction with the achievement-ability severe discrepancy criterion has led to consideration of achievement-domain-specific criteria for eligibility (see Chapter 8 for a discussion of problems with the severe discrepancy method).
Federal law defines LD not as a single disability, but as a group of disabilities that are expressed in one or more skill domains. The disabilities are manifested in the areas of: (1) listening; (2) speaking; (3) basic reading (decoding and word recognition); (4) reading comprehension; (5) arithmetic calculation; (6) mathematics reasoning; and (7) written expression. The broadness of this definition encompasses a wide range of learning difficulties eligible for treatment. However, the complexity of each skill domain and the overlap between the domains compromise diagnostic precision. Diagnosis is further complicated by the fact that disabilities in these areas may be accompanied by other disorders, which are not the cause of the LD.
Because definitional clarity is so elusive, developing a set of specific operational criteria for identifying individual children has been problematic. Some advocate modification of generic definitions to reflect separate evidence-based definitions of domain-specific disabilities. The development of operational definitions and criteria relevant to each domain would guide procedures to determine which students are eligible for reading disabilities or mathematics disabilities, etc. A great deal is now known about the most common of the learning disabilities—dyslexia or reading disability—and using this domain-specific definitional strategy, investigators have now begun to examine other common learning disabilities, for example, mathematics disability. In the following section we review the significant advances in understanding reading and reading disabilities.
Dyslexia is characterized by an unexpected difficulty in reading in children and adults who otherwise possess the intelligence, motivation, and schooling considered necessary for accurate and fluent reading (Shaywitz, 1998). Recent epidemiological data indicate that like hypertension and obesity, dyslexia fits a dimensional model. In other words, within the population, reading ability and reading disability occur along a continuum, with reading disability representing the lower tail of a normal distribution of reading ability (Gilger et al., 1996; Shaywitz et al., 1992; B. Shaywitz et al., 2001; S. Shaywitz et al., in press).
Dyslexia is one of the most common of childhood disorders, with a public school prevalence rate of approximately 6 percent (see Chapter 2). Previously, it was believed that dyslexia affected boys primarily (Finucci and Childs, 1981); however, more recent data (Flynn and Rahbar, 1994; Shaywitz et al., 1990; Wadsworth et al., 1992) indicate similar numbers of affected boys and girls. Longitudinal studies, both prospective (Francis et al., 1996; Shaywitz et al., 1995) and retrospective (Bruck, 1992; Felton et al., 1990; Scarborough, 1984), indicate that dyslexia is a persistent, chronic condition; it does not represent a transient developmental lag. Over time, poor readers and good readers tend to maintain their relative positions along the spectrum of reading ability (Shaywitz et al., 1995).
Dyslexia is both familial and heritable (Pennington and Gilger, 1996). Family history is one of the most important risk factors; between 23 and 65 percent of children who have a parent with dyslexia are reported to have the disorder (Scarborough, 1990). Rates among siblings of affected persons of approximately 40 percent and among parents of 27 to 49 percent (Pennington and Gilger, 1996) provide opportunities for early identification of affected siblings and often for delayed but helpful identification of affected adults. Linkage studies implicate loci on chromosomes 6 and 15
for reading disability (Cardon et al., 1994, 1995; Grigorenko et al., 1997) and most recently on chromosome 2 (Fagerheim et al., 1999).
Theories of dyslexia have been proposed that are based on the visual system (Demb et al., 1998; Eden et al., 1996; Stein and Walsh, 1997) and other factors, such as temporal processing of stimuli within these systems (Talcott et al., 2000; Tallal, 2000). Although other systems and processes may also contribute to the difficulty, there is now a strong consensus among investigators in the field that the central difficulty in dyslexia reflects a deficit in the language system. Investigators have long known that speech enables its users to create an indefinitely large number of words by combining and permuting a small number of phonological segments, the consonants and vowels that serve as the natural constituents of the biological specialization for language. An alphabetic transcription (reading) brings this same ability to readers, but only as they connect its arbitrary characters (letters) to the phonological segments they represent. Making that connection requires awareness that all words, in fact, can be decomposed into phonological segments. It is this awareness that allows the reader to connect the letter strings (the orthography) to the corresponding units of speech (phonological constituents) they represent. The awareness that all words can be decomposed into these basic elements of language (phonemes) allows the reader to decipher the reading code.
In order to read, a child has to develop the insight that spoken words can be pulled apart into phonemes and that the letters in a written word represent these sounds. As numerous studies have shown, however, such awareness is largely missing in dyslexic children and adults (Brady and Shankweiler, 1991; Bruck, 1992; Fletcher et al., 1994; Liberman and Shankweiler, 1991; Rieben and Perfetti, 1991; Shankweiler et al., 1995, 1979; Share, 1995; Shaywitz, 1998, 1996; Stanovich and Siegel, 1994; Torgesen, 1995; Wagner and Torgesen, 1987). Results from large and well-studied populations with reading disability confirm that in young school-age children (Fletcher et al., 1994; Stanovich and Siegel, 1994) as well as in adolescents (Shaywitz et al., 1999), a deficit in phonology represents the most robust and specific (Morris et al., 1998) correlate of reading disability. Such findings form the basis for the most successful and evidence-based interventions designed to improve reading (National Institute of Child Health and Human Development, 2000).
Basically, reading comprises two main processes—decoding and comprehension (Gough and Tunmer, 1986). In dyslexia, a deficit at the level of the phonological module impairs the ability to segment the written word into its underlying phonological elements. As a result, the reader experi-
ences difficulty, first in decoding the word and then in identifying it. The phonologic deficit is domain-specific; that is, it is independent of other, nonphonological, abilities. In particular, the higher-order cognitive and linguistic functions involved in comprehension, such as general intelligence and reasoning, vocabulary (Share and Stanovich, 1995), and syntax (Shankweiler et al., 1995), are generally intact. This pattern—a deficit in phonological analysis contrasted with intact higher-order cognitive abilities—offers an explanation for the paradox of otherwise intelligent people who experience great difficulty in reading (Shaywitz, 1996).
According to the model, a circumscribed deficit in a lower-order linguistic (phonological) function blocks access to higher-order processes and to the ability to draw meaning from text. The problem is that the affected reader cannot use his or her higher-order linguistic skills to access the meaning until the printed word has first been decoded and identified. Suppose, for example, that an individual knows the precise meaning of the spoken word “apparition”; however, until he can decode and identify the printed word on the page, he will not be able to use his knowledge of the meaning of the word, and it will appear that he does not know the word’s meaning.
Deficits in phonological coding continue to characterize dyslexic readers even in adolescence; performance on phonological processing measures contributes most to discriminating dyslexic and average readers, as well as average and superior readers (Shaywitz et al., 1999). Children with dyslexia neither spontaneously remit nor do they demonstrate a lag mechanism for catching up in the development of reading skills. In adolescents, fluency, defined as rapid, accurate oral reading with good comprehension, as well as facility with spelling may be most useful clinically in differentiating average from poor readers. From a clinical perspective, these data indicate that as children approach adolescence, a manifestation of dyslexia may be a very slow reading rate; in fact, children may learn to read words accurately, but they will not be fluent or automatic, reflecting the lingering effects of a phonological deficit (Lefly and Pennington, 1991). Because they are able to read words accurately (albeit very slowly) dyslexic adolescents and young adults may mistakenly be assumed to have “outgrown” their dyslexia. Data from studies of children with dyslexia who have been followed prospectively support the notion that the ability to read aloud accurately and rapidly as well as facility with spelling may be most useful clinically in differentiating average from poor readers in students in secondary school, college, and graduate school. It is important to remember that these older dyslexic students may be similar to their unimpaired peers on untimed
measures of word recognition yet continue to suffer from the phonological deficit that makes reading less automatic, more effortful, and slow. For readers with dyslexia, the provision of extra time is an essential accommodation; it allows them the time to decode each word and to apply their unimpaired higher-order cognitive and linguistic skills to the surrounding context to get at the meaning of words that they cannot entirely or rapidly decode. Other accommodations useful to adolescents with reading difficulties include note-takers, taping classroom lectures, using recordings to access texts and other books they have difficulty reading, and the opportunity to take tests in alternate formats, such as short essays or even orally (Shaywitz, 1998).
To a large degree, advances in understanding dyslexia have informed and facilitated studies examining the neurobiological underpinnings of reading and dyslexia. Thus, a range of neurobiological investigations using postmortem brain specimens (Galaburda et al., 1985) and, more recently, brain morphometry (Filipek, 1996), and diffusion tensor MRI imaging (Klingberg et al., 2000) suggest that there are differences in the temporo-parieto-occipital brain regions between dyslexic and nonimpaired readers.
Rather than being limited to examining the brain in an autopsy specimen or measuring the size of brain regions using static morphometric indices based on CT or MRI, functional imaging offers the possibility of examining brain function during performance of a cognitive task. In principle, functional brain imaging is quite simple. When an individual is asked to perform a discrete cognitive task, that task places processing demands on particular neural systems in the brain. To meet those demands requires activation of neural systems in specific brain regions, and those changes in neural activity are, in turn, reflected by changes in brain metabolic activity, which in turn are reflected, for example, by changes in cerebral blood flow and in the cerebral utilization of metabolic substrates such as glucose. The term functional imaging has also been applied to the technology of magnetic source imaging using magnetoencephalography, an electrophysiological method with strengths in resolving the temporal sequences of cognitive processes.
Recent findings using fMRI may help reconcile the seemingly contradictory findings of previous imaging studies of dyslexic readers (Shaywitz, B. et al., in press; Brunswick et al., 1999; Helenius et al., 1999; Horwitz et al., 1998; Paulesu et al., 2001; Rumsey et al., 1992, 1997; Salmelin et al., 1996; Shaywitz et al., 1998, submitted; Simos et al., 2000). In addition, some functional brain imaging studies show a relative increase in brain activation in frontal regions and right hemisphere systems in dyslexics com-
pared with nonimpaired readers (Shaywitz, B. et al., in press; Brunswick et al., 1999; Rumsey et al., 1997; Shaywitz et al., 1998, submitted; Georgiewa et al., 1999).The involvement of the posterior region centered about the angular gyrus is of particular interest, since this portion of association cortex is considered pivotal in carrying out those cross-modal integrations necessary for reading—that is, mapping the visual percept of the print onto the phonological structures of the language (Benson, 1994; Black and Behrmann, 1994; Geschwind, 1965). Consistent with this study of developmental dyslexia, a large literature on acquired inability to read (alexia) describes neuroanatomic lesions most prominently centered about the angular gyrus (Damasio and Damasio, 1983; Dejerine, 1891; Friedman et al., 1993).
It should not be surprising that both the acquired and the developmental disorders affecting reading have in common a disruption in the neural systems serving to link the visual representations of the letters to the phonological structures they represent. While reading difficulty is the primary symptom in both acquired alexia and developmental dyslexia, associated symptoms and findings in the two disorders would be expected to differ somewhat, reflecting the differences between an acquired and a developmental disorder. In acquired alexia, a structural lesion resulting from an injury, such as stroke or tumor, disrupts a component of an already functioning neural system, and the lesion may extend to involve other brain regions and systems. In developmental dyslexia, as a result of a constitutionally based functional disruption, the system never develops normally so that the symptoms reflect the emanative effects of an early disruption to the phonological system. In either case, the disruption is within the same neuroanatomic system. A number of studies of young adults with childhood histories of dyslexia indicate that although they may develop some accuracy in reading words, they remain slow, nonautomatic readers (Bruck, 1992; Felton et al., 1990).
The model used to study reading and reading disability has now been extended to the study of mathematics and mathematics disability. Though these studies are still in their infancy, the indications are that within the next decade, understanding of the underlying cognitive and neurobiological underpinnings of mathematics disability will be elucidated.
The concept of unexpected underachievement remains the central diagnostic criterion for designating a child as LD. Because the definition of LD in EHA/IDEA provided insufficient criteria for identifying eligible children, in 1977 the Department of Education published guidelines for the identification of an unexpected underachievement, settling on an operational defi-
nition of a severe discrepancy between achievement and intellectual ability, that is, an IQ-achievement discrepancy. Over time, it has become apparent that the use of the IQ-achievement discrepancy has the effect of delaying identification until the child falls below a predicated level of performance. Waiting for a child to exhibit failure sufficient to signal a significant discrepancy between IQ and achievement level takes time. This type of discrepancy cannot be measured until a child reaches approximately age 9 and by that time the student has been experiencing the frustration of academic failure for two to three years. A significant number of epidemiological data show clearly that the majority of children who are poor readers at age 9 continue to have reading difficulties into adulthood (Shaywitz et al., 1999). Thus, a reliance on the IQ-achievement discrepancy, when employed as the principal criterion for the identification of reading disability, possibly harms more children than it helps. Furthermore, good evidence indicates that it is possible to screen children as young as 4-5 years of age and identify those at risk for reading disability, an identification based on poor reading relative to chronological age, that is, poor reading defined solely on the basis of low reading achievement.
The results of several studies indicate that there are no significant differences in cognitive characteristics (other than verbal ability) between children who are poor readers relative only to chronological age (i.e., poor readers defined by low achievement) and children defined as reading disabled on the basis of unexpected underachievement (i.e., on the basis of an IQ-achievement discrepancy) (Fletcher et al., 1994; Stanovich and Siegel, 1994). In addition, neurobiological evidence using sophisticated brain imaging technology supports the cognitive data in indicating similar patterns of brain organization in children defined as having a reading disability on the basis of unexpected underachievement and on the basis of low achievement for chronological age (Shaywitz, B. et al., in press). Important for future policy development, the IQ test results and whether or not a child shows a discrepancy between IQ and reading achievement have little significance for understanding or treating a reading disability.
Disproportionate representation in the mental retardation (MR) category, especially the mild level (MMR), is a long-standing concern in discussions of the participation of minority students in special education (Dunn, 1968; National Research Council [NRC], 1982). Although the numbers of and the degree of disproportionality in minority and nonminority students classified as MR and participating in special education have declined substantially over the past 25 years, the greatest degrees of special education disproportionality continue to occur in this category. Currently,
2.63 percent of all black students receive special education services due to the MR disability, a rate that is 2.35 times the rate for white students. White and American Indian/Alaskan Native students are in MR programs at very close to the same rates, while Hispanic students have slight underrepresentation and Asian/Pacific Islanders have substantial underrepresentation (see Chapter 2). Disproportionate MR representation has been the most controversial and intractable pattern over the past few decades.
Many changes in MMR have occurred since the 1982 NRC report. While MR was the disability category of interest in that report, during the intervening period many of the mild cases have ceased to be identified as mentally retarded in many states (MacMillan et al., 1996d). It is instructive to note the “vacillating prevalence” of MR among schoolchildren in the past half century (Mackie, 1969). Mackie reports that between 1948 and 1966 there was a 400 percent increase in the number of children served in public school programs for students with MR. During the latter phase of that time period, the American Association on Mental Deficiency adopted the Heber (1959, 1961) definition that set the upper IQ cutoff score at -1 SD (roughly IQ 85), leading Clausen (1967) to note that this was the most liberal, inclusive definition ever of the concept of mental deficiency. In the mid-1960s there was no LD category recognized in federal law, and public schools encountering a youngster with severe and chronic low achievement had few options for helping that child—either they classified him or her as MMR, or services were restricted to the interventions available in general education.
The existence of two groups of individuals with MR has long been recognized (Dingman and Tarjan, 1960; Zigler, 1967). One is a more patently disabled group of individuals whose MR more often has a biological basis (referred to as “organic” by some) and whose IQ is commonly very low (i.e., below 50). Zigler (1967) proposed that this group of individuals represents a separate IQ distribution with a mean of approximately 35 and ranges from an untestable level up to an IQ of about 70. Zigler said that the intellectual functioning of this group of mentally retarded children reflected “factors other than the normal polygenic expression”—that these people had an “identifiable physiological defect.” A second group of individuals, referred to as “familial cases of mental retardation” evidence no organic impairment and are believed by Zigler to represent the lowest portion of the normal curve of intelligence. Predictions derived from these hypotheses generated by Dingman et al. were tested by Mercer (1973b), examining the presence of physical disabilities (e.g., seizures, ambulation, vision, and hearing problems) in individuals clinically identified as MR with IQ scores in the range of about 55-75, i.e., the familial type. She concluded: “Clearly, persons whose IQs are more than 3 standard deviations below the mean of
the population suffer from significantly more physical disabilities than persons whose scores fall within the normal curve (Mercer, 1973b:15).
The two groups of individuals with MR are important to the issue of assessment. Physicians typically diagnose the organic cases (i.e., those with IQs below about 55-60), very early in childhood using clinical and laboratory tests, medical histories, and other evidence employed in medical diagnoses. As described in the discussion of referral in Chapter 6, more severe cases of MR are commonly enrolled in preschool programs and arrive for public school enrollment already classified as mentally retarded. The MMR or familial cases, however, have traditionally arrived for enrollment in school undiagnosed with any disability. Diagnosis as MMR occurs only after chronic and severe achievement problems are found, marked by failure to respond to normative instructional materials and methods and leading ultimately to referral and psychoeducational assessment in an effort to determine: (a) if the child has a disability and (b) a prescription for educational treatment. It is this second group of children ultimately classified as MMR over whom the role of educational assessment is most relevant.
The dimensions of intellectual functioning and social competence (i.e., adaptive behavior) have been fundamental to most definitions of MR. The relative importance of these two dimensions, however, has varied in the different classification schemes proposed (MacMillan and Reschly, 1996). Definitions of MR adopted by the American Association on Mental Retardation (AAMR) have historically been the most influential in terms of being adopted in federal legislation and state education codes (Frankenberger and Fronzaglio, 1991). Moreover, the various AAMR definitions adopted since that of Heber (1961) reflect modest, but not insignificant, variations of that original definition, which read: “MR refers to subaverage general intellectual functioning which originates during the developmental period and is associated with impairment in adaptive behavior” (Heber, 1961:3). In subsequent revisions of the AAMR definition (Grossman, 1973, 1977, 1983; Luckasson, 1992), the importance of adaptive behavior vis-à-vis intelligence was enhanced, and the cutoff score on tests of intelligence defining “subaverage general functioning” has also varied.
Under Heber (1961), the criterion for subaverage general intellectual functioning was -1 SD (approximately IQ 85); however it was dropped to -2 SDs by Grossman (1973) (approximately IQ 70). Later, guidelines for employing IQ cutoff scores were adjusted, permitting identification of chil-
dren with IQ scores up to 75, a change explained as follows: “This particularly applies in schools and similar settings if behavior is impaired and clinically determined to be due to deficits in reasoning and judgment” (Grossman, 1983:11). The most recent AAMR definition (Luckasson, 1992) has continued with “a version” of the IQ 75 upper limit: “a score of approximately 70 or 75 or below” (p. 14). This imprecision has been criticized (MacMillan et al., 1993, 1995) on the basis of the proportion of cases falling between IQ 70 and 75 (even ignoring the standard error of measurement). MacMillan and Reschly (1996) argue that while setting the upper cutoff score is arbitrary, the imprecision reflected in the Luckasson guidelines reflects a lack of awareness of psychometrics. Table 7-1 shows the consequences of these subtle shifts in IQ scores for the proportion of children eligible on the intellectual dimension defining MR alone. Very slight shifts in cutoff scores have rather dramatic consequences in terms of the percentage of the general population eligible. The proportion eligible using IQ 70 and below is only half as large as the proportion eligible using a criterion of IQ 75 and below.
The application of more or less stringent cutoff scores clearly influences the degree of overrepresentation of disadvantaged minority children in the MR category. The degree of overrepresentation would be expected to be larger when higher, rather than lower, IQ cutoff scores are employed due to the nature of the distributions of intellectual performance. Reschly and Jipson (1976) studied the effects of different IQ cutoff scores (IQ 70 and IQ 75) on the potential overrepresentation of black, Hispanic, and American Indian children. Greater overrepresentation occurred at IQ 75 and below than at IQ 70 and below. Moreover, the fact that tests of intelligence yield
TABLE 7-1 Proportion of the Population Falling Below Certain IQ Cutoffs and Falling Within Certain IQ Intervals
IQ | Normal Curve Percentage |
Below 70 | 2.28 |
70 and below | 2.68 |
Below 75 | 4.75 |
75 and below | 5.48 |
IQ Interval | Percentage Within Interval |
56-60 | 0.30 |
61-65 | 0.69 |
66-70 | 1.52 |
71-75 | 2.80 |
different distributions for different groups results in higher risk of being identified as MR for those groups whose distributions yield a lower mean score. For example, Kaufman and Doppelt (1976) examined the standardization data for the WISC-R and reported for white subjects a mean IQ of 102.26 (SD = 14.04) and a mean IQ of 86.43 (SD = 12.70) for black subjects in the standardization sample. Clearly, on the IQ-test dimension alone, a higher percentage of black students are at risk for being classified as MR.
In addition to the influence of the cutoff score adopted to define general intellectual functioning, the type of intellectual measure also influences the rate of eligibility for certain racial/ethnic groups. The Diana (1970) and Guadalupe (1972) consent decrees were directed at reducing the overrepresentation of American Indian and Hispanic students in special education programs for students with MMR. Both consent decrees required adoption of non-English language or performance IQ measures in future evaluations of American Indian and Hispanic students. In the Reschly and Jipson (1976) study, the prevalence rates for Hispanic and Native American children were considerably higher if one used verbal IQ scores to define aptitude. When the nonverbal (i.e., performance IQ) measure was used, the overrepresentation of Hispanic and Native American children was virtually eliminated; however, the overrepresentation of black children was about the same regardless of the type of intellectual measure. MacMillan et al. (1998b) contrasted eligibility decisions that would be reached for a referred sample of children, stratified on the basis of ethnic group (i.e., white, black, Hispanic) using psychometric criteria. Referred Hispanic students scored on average 8 points higher on performance IQ than on verbal IQ using the WISC-III, which was not found for either white or black samples of referred students. The eligibility decisions based exclusively on psychometric data (i.e., no clinical or other evidence considered) were then contrasted using full-scale IQ (FSIQ) or performance IQ (PIQ), (Table 7-2). Consistent with the Reschly and Jipson (1976) findings, use of PIQ dramatically alters the percent of Hispanic students scoring below the IQ cutoff that defines MR. Using PIQ as the estimate of aptitude, 11 fewer Hispanic students scored 75 or below than did so on the FSIQ. Only one Hispanic child qualified as MR on the PIQ who did not qualify on the FSIQ. To a lesser degree PIQ also reduced the number of black students by four who qualified as MR in comparison to the number qualifying when FSIQ was used. For white students, however, a slightly different pattern emerged. Use of PIQ instead of FSIQ resulted in three children moving out of the MR classification, while four additional students who did not qualify as MR using FSIQ did qualify using PIQ as the estimate of aptitude. Clearly, the use of PIQ would reduce the percentage of black and Hispanic students referred to special education who would qualify as MMR.
TABLE 7-2 Comparison of Classification as MR, LD, and Ineligible Using FSIQ and PIQ to Estimate Aptitude by Ethnic Group
| PIQ Classification |
|
| ||
FSIQ Classification | MR | Ineligible | LD | TOTAL | Kappa |
White | |||||
MR | 7 | 1 | 2 | 10 |
|
Ineligible | 3 | 15 | 6 | 24 |
|
LD | 1 | 0 | 20 | 21 |
|
Total | 11 | 16 | 28 | 55 | 0.63 |
Black | |||||
MR | 10 | 4 | 0 | 14 |
|
Ineligible | 0 | 16 | 2 | 18 |
|
LD | 0 | 0 | 10 | 10 |
|
Total | 10 | 20 | 12 | 42 | 0.78 |
Hispanic | |||||
MR | 7 | 7 | 5 | 19 |
|
Ineligible | 1 | 6 | 11 | 18 |
|
LD | 0 | 1 | 15 | 16 |
|
Total | 8 | 14 | 31 | 53 | 0.31 |
There is somewhat of a paradox in this classification exercise. While the use of performance IQ as the estimate of aptitude reduced the number of Hispanic students qualifying as MMR by 11, it also resulted in increasing the number of Hispanic students qualifying as LD by a total of 16. By optimizing the aptitude estimate while the measure of achievement remained constant, the total number of LD cases for Hispanics more than offset the reduction in the number of children who moved out of the MR classification.
The construct of intelligence has historically been fundamental to defining MR. As discussed elsewhere in this report, tests of intelligence have very limited curricular validity and, when routinely administered to establish the eligibility of students as MR or LD, add considerably to the cost of assessment. Nevertheless, in the context of MR, “subaverage general intelligence” is a defining feature of the disability and using measures other than tests of intelligence or resistance to treatment as criteria for eligibility raises some perplexing possibilities. For students who are referred for psycho-educational assessment, the charge is to identify those cases whose “failure to thrive” in the best clinical judgment of the individual education program (IEP) committee is “due to low general intelligence” as opposed to competing hypotheses, such as a specific processing problem (i.e., LD) or emo-
tional or behavioral problems that are so severe that they interfere with adaptation or academic achievement (i.e., emotional disturbance). Making that differential diagnosis without using tests of intelligence raises a host of issues. We turn now to the second behavioral dimension defining MR, adaptive behavior.
The second behavioral dimension defining MR, impairments in adaptive behavior or adaptive skills, presents more serious psychometric problems to those conducting the assessment, particularly when applied to those with MMR. Even when the Heber definition was dominant, which would permit approximately 16 percent of the general population having IQ scores of 85 or below to be classified as MR, no more than 1 percent of the general population was identified as MR (Mercer, 1973a, b; Tarjan et al., 1973). The reason for the discrepancy was that the schools and other clinicians never used IQ alone to define MR; two dimensions were always considered in making a diagnosis (impairments in adaptive behavior and subaverage general intellectual functioning). In fact, these two criteria were applied sequentially: (a) impairments in adaptation and then (b) subaverage general intellectual functioning. Only children referred by their classroom teachers were ever evaluated on the intellectual dimension. A huge percentage of those who would have scored below IQ 85 were never referred, and even among those who were, only a small percentage were ever certified as eligible for services. In fact, some of those referred were protected from certification as a result of the psychoeducational assessment provided.
Ashurst and Meyers (1973) also examined data from the Riverside study reported by Mercer (1973a). They examined 269 cases of children referred for severe and persistent academic underachievement. Of interest to the current discussion is how referred cases were deemed eligible or ineligible by school psychologists and then how admissions and dismissal committees arrived at decisions in light of: (a) teacher referral data, and (b) psychologist certification that the child was eligible or ineligible. Five different results were identified and the number of cases fitting a given “result” noted:
Teacher referred, psychologist found child eligible, child placed as MMR (86 children).
Teacher referred, psychologist found child eligible, child not placed (63 children).
Teacher referred, psychologist found child ineligible, child not placed (116 children).
Teacher referred for reason other than academic problems, psychologist found child eligible as MMR, child placed (1 child).
Teacher referred for reason other than academic problems, psychologist found child eligible as MMR, child not placed (3 children).
Of 269 children referred by teachers, only 87 (32 percent) were actually placed. In 116 cases (43 percent), the IQ score secured by the psychologist actually prevented certification, being above the cutoff score for MR. Finally, in 63 cases (23 percent) of all referred children, the child was not placed despite having an IQ score permitting eligibility. Of the 153 referred children with IQ scores permitting placement in programs for MMR, less than three-fifths were actually placed (57 percent). Clearly, IQ alone did not preordain placement as MR. These data were collected in the early 1960s, when the more inclusive Heber definition was in effect in the California education code. To quote Mercer (1973b): “Clinicians are apparently assessing more than IQ test scores in making diagnoses” (p. 15). Something akin to adaptive behavior enters into the placement formula as well as numerous contextual factors including, but not limited to, parental opposition, perceived competence of the special education teacher, issues of second language acquisition, and the like.
The inclusion of adaptive behavior as a dimension defining MR has been controversial since introduced by Heber (see Clausen, 1967, 1968, 1972; Zigler et al., 1984; Zigler and Hodapp, 1986) due to the subjectivity (i.e., unreliability) it introduces into the diagnostic process. These concerns are particularly salient to the segment of children considered MMR, the category in which overrepresentation is most prominent, because the domains measured by extant scales do not tap the behaviors that prompt referral of cases of MMR. Instead, a ceiling effect is noted. Paradoxically, the segment of children for whom diagnosis is most difficult is the same segment for which the existing scales are least appropriate.
State definitions of MR continue to use the Grossman (1983) definition as a model, opting not to adopt the more current AAMR version (Luckasson, 1992). Denning et al. (2000) summarized existing state definitions and classification practices, reporting that 44 states use the Grossman definition while three used the Luckasson definition. Only one state (Massachusetts) reported that consideration of adaptive behavior was not required in diagnosing MR. However, only 14 states actually listed specific practices that needed to be considered for eligibility. This is consistent with an earlier survey by Frankenberger and Fronzaglio (1991:318), who reported:
Even though states appear to be moving toward agreement on IQ cutoffs, there is little agreement in the states’ methods of identifying deficits in adaptive behavior and academic achievement. In the current study, only 7 states delineated cutoff scores indicative of deficits in adaptive behavior.
In fact, clinical judgment has usually been employed to supplement the information on the severe and chronic achievement problems that prompted the referral in arriving at the conclusion that adaptive behavior is impaired. Garber (1988) described the situation as follows: “Definition may require that both intellectual and adaptive skill levels be ascertained....It is the low IQ scores that cause the label of mental retardation to be applied” (p. 10). Reschly (1992) observed that prior to about 1980, for school-age children and youth who accounted for the majority of detected cases of MR, “low achievement as assessed by standardized measures of achievement along with referral for academic difficulties was sufficient to constitute a deficit in adaptive behavior” (p. 33). Over the past two decades, considerable effort has been devoted to the more precise measurement of adaptive behavior in multiple contexts (Harrison and Robinson, 1995).
Disagreements over the key domains have complicated the use of adaptive behavior in decisions about MR eligibility in schools, as has uncertainty about appropriate cutoff scores to define a deficit in adaptive behavior. Adaptive behavior measures differ in underlying conceptions of adaptive behavior (e.g., the degree to which learning and achievement are important dimensions for children and youth), methods of obtaining information (e.g., third-party respondent vs. direct observation), the key contexts (e.g., home, school, neighborhood), and appropriate respondent (e.g., parent, teacher, peers, or the child himself or herself). A most vexing but enormously important issue is the selection of a cutoff score to define a deficit in adaptive behavior. The modern MR definitions refer to a deficit in adaptive behavior or deficits in adaptive skills. They include the modifier, “significantly sub-average” that is the basis for the IQ of approximately 70 to 75 on the intellectual functioning dimension. There is no modifying wording applied to adaptive behavior that provides the basis for a specific, required cutoff score for adaptive behavior. Consistent with these definitions, the deficit might be more appropriately defined through clinical judgment or a criterion such as 1 SD below the mean rather than the 2 SD criterion applied to the intellectual dimension.
The issues concerning adaptive behavior measurement are more than sterile academic debates. Research in the 1980s showed that MR was essentially eliminated if the adaptive behavior measure focused on nonschool settings, eliminated practical cognitive skills, and used parents as the sole respondents (Heflinger et al., 1987; Kazimour and Reschly, 1981). Recently developed adaptive behavior instruments generally suggest a more moderate view, in which the adaptive behavior cutoff score is somewhat flexible and decisions about the existence of deficits are based on consideration of performance over several domains (Harrison and Oakland, 2000). The evidence to date clearly supports the conclusion that the measurement of adaptive behavior is not as well developed as the measurement of general intellectual functioning.
There is an extensive literature documenting the changes that have occurred in the population of children served by the public schools as MR. It was commonly stated in the 1970s that 75-80 percent of all individuals with MR were at the mild level (IQ 55-70 to 75) who did not display physical or other identifiable signs of biological anomaly. Such statements persisted even later (e.g., Grossman and Tarjan, 1987). However, criticisms and legal challenges to the process whereby children were classified as MMR coupled with the changes brought about with the enactment of EHA/IDEA were successful in reducing the number of schoolchildren classified as MMR—largely because of the reluctance on the part of schools to use the MR classification for students in the mild range (MacMillan et al., 1996c). While MR was the disability category accounting for the largest number of children served in special education when President Ford signed EHA into law in 1975, by 1993-1994 there had been a 38 percent decline (a reduction of over 335,000 children) in the number of students so served. In the 1996-1997 school year, the percent of schoolchildren classified as MR was 1.16 percent. During the same period, the number of children served as learning disabled increased by 207 percent (an increase of over 1.5 million children).
Since 1970 the borderline MR subgroup (those with IQ scores between 70 or 75 and 85) has been excluded from the MR category. Moreover, in many states there is reluctance to classify able-bodied students as MR, with the result that the MR population in 2000 in comparison to that of 1970 is more patently disabled. During the 1980s, a number of investigators noted the “change” in the MMR population. For example, MacMillan and Borthwick (1980) described the MMR population as including many children who prior to that time would have been served in programs for students with moderate levels of MR (IQ 40 to 55) (e.g., children with Down syndrome). Epstein et al. (1989) questioned whether the cultural-familial subgroup of MR children, as traditionally defined, is to be found today in MMR classes. Their survey found that 90 percent of the post-EHA/IDEA MMR students they studied needed speech and language assistance and multiple handicaps were frequently evident (convulsive disorders, serious levels of visual impairments, history of significant behavior disorders). Polloway et al. (1986) noted that the younger MMR students “were identified virtually at the initiation of their school careers” (p. 7), a situation that differs markedly from that described by Mercer (1973a, b) earlier, when initial enrollment was in general education and referral came only after failure to keep up academically for three or more years. As Gottlieb (1981) noted, “the category of mild MR appears reserved for the lower end of the mild MR range usually for children having an IQ of about 65 or lower” (p. 124).
The assessment of children who are ultimately classified as MR has changed dramatically since the period addressed in the previous National Research Council report (1982). Increasingly, a greater percentage of the children come to school already classified by the medical profession, rendering the issue of IQ moot. As MacMillan et al. (1996c) found, most children referred and given psychoeducational assessments and who score below IQ 75 are currently classified in many schools as LD, not MR. The discrepancy between who qualifies as MR according to specified criteria and who is administratively labeled MR by the schools is considerable. This explains, in part, the decline in the number of children identified in school as MR—a phenomenon that is on the way to rendering MR a low-incidence disability.
In 1997, 446,835 students between ages 6 and 21 were receiving services under the category of emotional disturbance (ED). Although the theoretical prevalence estimates for students with ED range from 3 to 6 percent of the student population (Brandenberg et al., 1990; Forness et al., 1983; Skiba et al., 1994), enrollment statistics indicate that approximately 1 percent of the school-age population is certified with ED as a primary disability (Forness, 1992b; see Chapter 2). Furthermore, there is substantial variability in ED prevalence rates from state to state, with estimates ranging from 6 per 10,000 in Mississippi to 2 per 100 in Minnesota (see Table 6-1).
The risk of ED classification for black students is 1.56 percent, a rate that is approximately 1.6 times the white rate of approximately 1.0 percent. The ED classification risk is the same for white and American Indian/ Alaskan Native students. The white risk is approximately 1.4 times the Hispanic rate and 3.6 times the Asian/Pacific Islander rate. The ED risk, like the MR risk, is highest for black students, nearly equal for white and American Indian/Alaskan Native students, slightly lower for Hispanic students, and markedly lower for Asian/Pacific Islander students.
These findings—underidentification of children and youth with emotional disturbances and overrepresentation of black students in the ED category—suggest that relatively few students with behavior problems are being served under the ED category and that the procedures currently employed for identifying and screening students for possible inclusion in this category require examination. Moreover, the lack of definitional clarity and reactive school practices in addressing emotional and behavioral disorders may, in part, contribute to the varying ED prevalence rates and over-representation of black students.
There are three main perspectives on emotional and behavioral disorders: clinical, empirical, and educational (Hallahan and Kauffman, 1997; Kauffman, 1997). Childhood behavior disorders have been most often conceptualized from a clinical, medical-model perspective. The Diagnostic and Statistical Manual of Mental Disorders 4th ed. (DSM IV; American Psychiatric Association, 1994) has used professional judgment to identify and assign psychiatric diagnoses such as oppositional defiant disorder (ODD), conduct disorder (CD), and antisocial personality disorders (APD). Some researchers contend that there may be a developmental progression from less severe disorders (e.g., ODD) to more severe disorders (e.g., CD) noting that prevalence rates of these disorders decrease as the severity of the disorder increases (Frick et al., 1992; Frick, 1998; Lahey and Loeber, 1994). Yet this clinical classification system suffers from problems of reliability and validity due to the heavy reliance on professional judgment (Gresham, 1985).
Empirical approaches to behavior disorders, in contrast, employ factor analytic procedures for identifying behavior patterns and thereby afford improved reliability and validity relative to clinical classification systems. Examples of such tools include Achenbach’s (1991) Child Behavior Checklist, Quay and Peterson’s (1983) Revised Behavior Problem Checklist, and Gresham and Elliott’s (1990) Social Skills Rating System. These instruments can be used to identify broad-band (e.g., externalzing and internalizing behaviors) and narrow-band syndromes (e.g., aggressive, delinquent behaviors vs. withdrawn, immature behaviors). One dilemma with the use of empirical classification schemes, however, is how to interpret reliable data in which multiple informant perspectives (e.g., parents, teachers) do not converge.
The final perspective is that of education. The federal definition of ED first came into being as part of the Education of the Handicapped Act in 1975 and has not changed substantially in the past 25 years. Congress constructed the federal definition of ED from a study conducted by Eli Bower that identified the following five dimensions of maladaptive behavior as characteristics of ED (Bower, 1960):
inability to learn that cannot be explained by intellectual, sensory, or health factors;
inability to build or maintain satisfactory relationships with peers or teachers;
inappropriate types of behavior or feelings under normal circumstances;
general pervasive mood of unhappiness or depression; and
tendency to develop physical symptoms or fears associated with school problems.
In essence, the federal definition requires at least one of Bower’s five characteristics to adversely affect a student’s academic performance across “a long period of time” and to a “marked degree” (Individuals with Disabilities Education Act, 1997).
Furthermore, the federal definition requires states to exclude students labeled socially maladjusted (SM) from special education eligibility. To date, the five characteristics in the definition of ED lack specificity and represent a variety of behavioral and emotional disorders with different etiologies and implications for interventions. Likewise, the distinction between students who are emotionally disturbed and those who are socially maladjusted is not operationally defined or empirically validated by federal legislatures, educators, or research (Center, 1990; Forness and Knitzer, 1992; Webber, 1992), although some professionals equate social maladjustment, an education term, with conduct disorder, a clinical term (see Forness, 1992a). Moreover, the exclusion of students labeled socially maladjusted appears unwarranted given the similarity between students labeled ED and SM across behavior, social competence, academic, and contextual factors (Council for Children with Behavioral Disorders, 1987; Forness 1992b; Skiba and Grizzle, 1991; Walker et al., 2000).
Finally, the federal definition of ED does not consider important behavioral differences associated with gender, ethnicity, developmental level, or contextual factors in defining and assessing each of these characteristics. Research by Forness (1992b) suggests that the allocation of services to students with conduct problems largely depends on the presence of comorbidity with other disorders (e.g., depression or attention deficit hyperactivity disorder). Findings also indicate that it is not until a specific learning disability is diagnosed that students with conduct problems become eligible for special education services.
Thus, the lack of consistent terminology across clinical, empirical, and education perspectives is problematic. First, lack of definitional uniformity hinders effective communication between professions in the clinical, research, and school settings. This is particularly troublesome given that many of today’s students who have or are at risk for behavior disorders are likely to require services from the educational and mental health systems (Walker et al., 1999). Second, without a clear, reliable definition, design and implementation of identification, assessment, and intervention procedures are at best challenging. From inspection of the prevalence rates for CD (3-6 percent of the school-age population) and ED (less than 1 percent
of the school-age population) it is clear that many students who exhibit problem behaviors will not ultimately receive special education services under the label of ED. While some students with specific psychiatric problems (e.g., depression) may require only mental health services and many not necessarily benefit from special education services under the ED label, other students with conduct problems may be going unidentified (false negatives) for ED.
Accordingly, Forness and Knitzer (1992) called for a new, broader definition: emotional and behavioral disorder (E/DB), a term that has been largely adopted by the research community. This new term, posed initially by the National Mental Heath and Special Education coalition, is defined as follows (Forness and Knitzer, 1992:13):
The term emotional and behavioral disorder means a disability characterized by behavioral or emotional responses in school so different from appropriate age, cultural, or ethnic norms that they adversely affect educational performance. Educational performance includes academic, social, vocational, and personal skills. Such a disability:
is more than a temporary, expected response to stressful events in the environment;
is consistently exhibited in two different settings, at least one of which is school related; and
is unresponsive to direct intervention in general education, or the child’s condition is such that general education interventions would be insufficient.
Emotional and behavioral disorders can co-exist with other disabilities.
This category may include children or youths with schizophrenic disorders, affective disorders, anxiety disorder, or other sustained disorders of conduct or adjustment when they adversely affect educational performance in accordance with section (i).
The benefits of this new definition include (a) addressing disorders of emotion and behavior while recognizing that they may co-occur or occur independently, (b) establishing a school-based definition that acknowledges that disorders demonstrated beyond the school day are also relevant, (c) sensitivity to ethnic and cultural differences, (d) acknowledging the importance of prereferral interventions, (e) recognizing that disabilities can co-occur, and (d) eliminating arbitrary exlusions (Hallahan and Kauffman, 1997; Webber and Scheuermann, 1997). Unfortunately, the legal definition guiding eligibility and service delivery is still that of emotional disturbance.
Students with ED, by definition, are characterized by behavioral and academic problems that negatively influence school-, teacher-, and peer-related adjustment (Hersh and Walker, 1983; Walker et al., 1995). The ED label addresses both externalizing (e.g., aggression, delinquency) and internalizing (anxiety, depression, withdrawal) behaviors (Achenbach, 1991). Externalizing behavior, which is characteristic of the majority of students served under the ED label, tends to be more stable over time, less amenable to intervention and therefore faces a worse prognosis for remediation relative to internalizing behavior (Gresham et al., 1999; Hinshaw, 1992a, b). Students with externalizing behavior patterns also tend to function at a lower level in social, cognitive, and academic arenas and are more likely to attract teacher attention in comparison to students with internalizing behaviors (Dodge, 1993; McConaughy and Skiba, 1993). It is important to note that, in addition to the aggressive, coercive behavior patterns typical of these students, students with ED are also characterized by acquisition and performance deficits in academic areas as well as low rates of time academically engaged (Coie and Jacobs, 1993).
Evidence suggests that the coexistence of learning and problem behaviors is evident during the preschool years and is predictive of a wide range of pejorative outcomes, which include academic underachievement, truancy, school dropout, motor vehicle accidents, unemployment, substance abuse, criminality, and welfare receipt (Walker et al., 1995; Walker and Severson, 2001). To prevent these deleterious outcomes, early intervention is essential and has been the focus of recent efforts in the research community (Conduct Problems Prevention Research Group, 1999a, b). In order to serve these students more effectively, intervention needs to occur early in a child’s schooling when he or she is less resistant to intervention efforts (Kazdin, 1987; Walker and Severson, 2001) and when less intensive interventions are more likely to produce the desired changes in a student’s behavioral and academic performance (Lane, 1999).
Yet the focus of intervention efforts has not been empirically validated. Three hypothetical models have described the relationship between externalizing behavior patterns and academic underachievement (Hinshaw, 1992a, b; Lane, 1999; Lane et al., 2001a). The first model suggests that academic underachievement leads to externalizing behavior. Students who lack the skills or motivation to participate in the requisite instructional tasks may act out to escape the task demand. The second model hypothesizes that externalizing behavior problems lead to academic underachievement. According to this model, students who engage in disruptive classroom behaviors do not benefit from participation in essential instructional activities. Over time, this lack of participation may lead to academic under-
achievement. The final model poses a transactional relationship between these two domains. These models have direct implications for intervention. If the first model is accurate, intervention efforts should target increased academic achievement. If the second model is correct, intervention should focus on decreasing problem behaviors. If the transaction model is accurate, then intervention would need to focus on both domains.
Although the relationship between externalizing behavior patterns and academic underachievement has been explored for more than a quarter of a century (Berger et al., 1975; Hinshaw 1992a, b; Richards et al., 1995; Rutter and Yule, 1970), only a handful of treatment-outcome studies have been conducted to examine the validity of these hypothetical causal models (Ayllon et al., 1975; Ayllon and Roberts, 1974; Coie and Krehbiel, 1984; Lane, 1999; Lane and Wehby, in press). Although few in number, intervention studies conducted to date provide preliminary support for the first causal model: academic underachievement leads to externalizing behaviors (Ayllon et al., 1975; Ayllon and Roberts, 1974; Coie and Krehbiel, 1984; Lane and Wehby, in press). In the studies mentioned, when students experienced academic improvement in either acquisition or performance deficits (Frentz et al., 1991), collateral improvement on behaviors was observed. However, these findings must be interpreted with extreme caution given that interventions have not been conducted systematically across students of varying ages. Clearly, additional treatment outcome research is warranted.
Schools are challenged by the task of identifying, assessing, and educating students with ED; several interrelated issues collectively influence educational outcomes negatively (Lane, 1999). These issues, or challenges, exist at federal, state, and local levels (Council for Children with Behavioral Disorders, 1990; Lane, 1999; Maag and Howell, 1992; McIntyre, 1993) and include reactive school practices in identification, resistance to intervention, current educational practices, and current screening and assessment practices.
Due to a lack of definitional clarity and reactive approaches to addressing problem behaviors, students who begin school with behavior problems typically do not receive services until such time as a disability is diagnosed or significant academic underachievement is apparent (Forness, 1992b). Studies indicate that teacher referral for special education frequently occurs in the early primary grades, but the time delay between first documentation
of a problem and first placement for ED services may be five years or more (Duncan et al., 1995, Nishioka, 2001) much like the wait-to-fail model utilized to identify students with LD (Fletcher et al., 1998). Forness and colleagues (1983) suggest that a trimorbidity of social maladjustment, emotional or behavior disorders, and a learning disability appears to be the only way to obtain a label of ED. Until the diagnosis of ED is made, schools often rely on punitive procedures (e.g., office referral, in- and out-of-school suspensions) to control the behavior of these students. Unfortunately, most research would suggest that these tactics are ineffectual in meeting the needs of students with ED (Lewis and Daniels, 2000).
This population becomes increasingly resistant to intervention efforts over time (Kazdin, 1987, 1993; Walker and McConnell, 1995). If comprehensive interventions are implemented prior to 3rd grade, it is possible to prevent the development of antisocial behavior, the cornerstone of conduct disorder. However, after approximately 8 years of age, the behavior patterns are relatively stable and intervention efforts move from prevention to remediation (Bullis and Walker, 1994; Kazdin, 1987). Furthermore, interventions implemented after 3rd grade require greater intensity and would be more ideographic in nature—as in functional assessment-based interventions—relative to those interventions implemented earlier in a child’s educational career. While functional assessment-based interventions have been quite successful with students with behavior disorders (Lane et al., 1999), these interventions are often time- and labor-intensive, a fact that necessarily limits the number of students who can receive them (Lane, 1999). Accordingly, proactive efforts, such as early detection and early intervention, are essential in order to better serve these students.
Current educational practices for students with ED have been sharply criticized (Knitzer et al., 1990; Steinberg and Knitzer, 1992) for creating barriers that impede effective educational programming. In particular, barriers pertaining to curricular content, classroom management practices, and services delivery (Peacock Hill Working Group, 1991; Webber and Scheuermann, 1997) have been cited as problematic.
Curricular content. The primary concerns regarding curricular content of ED classrooms range from not addressing both academic and sociobehavioral domains to an overall lack of systematic programming (Wehby et al., 1998). Kauffman (1997) contends that instruction in irrelevant, non-functional skills actually contributes to the development of emotional and behavioral problems. And some researchers voice concern about an absence of a strong academic focus in ED classrooms (Lane and Wehby, in press),
whereas others contend that the curricular content too closely parallels general education curriculum, with little attention afforded to the students’ emotional needs (Webber and Scheuermann, 1997). One possible explanation for this lack of attention to students’ emotional needs is the decline in availability of mental health and school-based counseling (Knitzer et al., 1990).
Another concern in the area of service delivery is the tendency to implement ED curriculum and programs that have not been empirically validated. Program and material selection does not seem to be guided by data-driven outcomes (Peacock Hill Working Group, 1991). Instead, it would appear that programs and procedures that produce short-term behavioral changes are sought to address immediate rather than long-term needs (Webber and Scheuermann, 1997). To compound the problem even further, there is a shortage of certified teachers to work with ED students (Wald, 1996). Thus, untrained teachers are left to educate very difficult-to-teach students (Rockwell, 1993). When ED students employ the coercive tactics learned at home (albeit unintentionally) in the school setting with teachers who are ill prepared to manage such behavior patterns (Reid and Patterson, 1989), the result is an aversive series of student-teacher interactions that lead to classroom environments with low rates of praise delivery, positive student recognition, and instruction (Shores et al., 1993; Webber and Scheuermann, 1997; Wehby et al., 1998). Consequently, ED programs often feature a curriculum that is neither empirically validated sufficiently nor comprehensive enough to address the students’ academic and socio-behavioral needs—and, to compound the difficulties further, it is implemented by educators without the proper training.
Classroom management practices. Classroom management practices have been widely criticized for what is referred to as a “curriculum of control” (Knitzer et al., 1990; Zable, 1992). A study conducted by Zable (1992) suggests that this emphasis on control has stemmed from administrative pressure, a mandate to emulate general education curricula, and a lack of options. It would appear that little emphasis is placed on identifying the function of the maladaptive behavior and then teaching appropriate replacement behaviors that meet the same functional need (Mace, 1994). Proactive procedures such as precorrection plans (Walker and McConnell, 1995) and rich praise delivery schedules (Wehby et al., 1998) are not being employed to enhance classroom instruction or to prevent behavior problems from occurring during instruction.
The field of behavior disorders has been influenced by the recent shootings that have occurred in schools across the nation (Walker et al., 1999). This tragic series of events has highlighted the need for proactive approaches to identify and assess troubled youth who may be at risk for committing such atrocities. At first glance, screening and early detection appear to be rather simplistic; however, most emotional and behavioral disorders of childhood are not so extreme that they are easily detected by the untrained observer (Kauffman, 1997; Webber and Scheuermann, 1997). However, the field of emotional and behavioral disorders has made substantial progress over the past 20 years, particularly in the area of early detection and intervention.
Researchers have established the importance of utilizing school-wide screenings to detect students at risk for ED, employing a variety of assessment tools and procedures based on the principle of multioperationalism, and designing and implementing comprehensive interventions that are linked to assessment results (Gresham, 1985, Gresham et al., 2000; Lane, 1999; Walker and McConnell, 1995). Programs and instruments such as the Systematic Screening for Behavior Disorders (Walker and Severson, 1992) and the Student Risk Screening Scale (Drummond, 1994) are now available for use in schools to identify students who may be at risk for emotional and behavioral disturbances.
Over the past 10 years there has been a tremendous increase in the availability of assessment instruments and practices to assess the various domains of emotional and behavior disorders. Some of the more recent advances in assessment include: (a) the use of conditional probabilities methodology (Milich et al., 1987; Pelham et al., 1992); (b) functional assessment methodologies (Horner, 1994; Umbreit, 1995); (c) the notion of resistance to intervention (Gresham, 1991, 2001); (d) direct observation systems, such as the Multiple Option Observation System for Experimental Studies (Tapp et al., 1995); (e) the School Archival Records Search (Walker et al., 1991); (f) curriculum-based assessment (Shinn, 1989); and (g) psychometrically sound rating scales, such as the Child Behavior Checklist (Achenbach and Edelbrock, 1991) and the Social Skills Rating System (Gresham and Elliott, 1990), which can be completed by multiple informants (e.g., parents, teachers, and, in some instances, students).
Federal regulations mandate that assessments be conducted by a multidisciplinary team of qualified specialists, given that the assessment results will not only influence eligibility and placement decisions, but also will help guide instructional programming. However, theory and practice do not always converge. Too often the teams have not embraced the advances in screening and assessment and therefore they do not function as intended.
Diagnostic, placement, and curricular decisions are frequently made based on limited, rather subjective information (Kauffman, 1997).
Current assessment practices related to the determination of eligibility for disabilities are heavily influenced by legal requirements, as noted in Chapter 6. These requirements determine the kind of assessment that must be provided to all students considered for special education, including LD, MR, and ED. Compliance with these legal requirements is prompted by professional ethics and federal and state compliance monitoring activities, which typically focus on sample cases of students placed in special education. During these monitoring activities, careful scrutiny of the assessment practices and the domains of behavior examined establishes strong incentives for school district personnel to follow general assessment requirements and specific disability classification criteria, although at least some studies suggest that the criteria are applied loosely, especially in the determination of LD.
The typical assessment battery for nearly all students with disabilities includes the administration of a comprehensive, individually administered test of current intellectual functioning (IQ test), an individually administered general achievement test, classroom observation of student behavior, and one or more behavioral checklists or rating scales typically completed by the teacher or parent. In some regions, various tests of underlying processes are utilized (e.g., visual-motor, auditory processing). This battery is used with virtually all students with disabilities. The only exceptions occur with students with severe or marked sensory disabilities, which may render psychological and educational assessment impossible. Medical specialists typically diagnose students with severe disabilities of these kinds, and special education eligibility determination is not the primary focus of the evaluation.
The relative emphasis placed on the domains above—that is, current intellectual function, achievement, and behavior ratings—depends on the disability that is being considered by the multidisciplinary team. For students considered for the diagnosis of LD, there typically is in-depth consideration of achievement in one or more of the domains identified as problematic in the referral. For example, for a student referred due to low reading achievement, administration of several reading tests and additional formal and informal assessments of reading skills are likely in order to establish more precisely the degree and nature of the reading difficulty. Depending on state classification criteria and local practices, students considered for LD may also receive one or more tests of underlying psychological processes. Currently, measures of phonological processes are nearly
always part of an LD evaluation if the referral involves reading concerns. The intellectual ability/achievement discrepancy, in current practice, is the most fundamental part of the LD eligibility determination in most states, virtually necessitating the administration of individual IQ and achievement tests.
Determination of eligibility in the category of MR is similar to that for the LD category in that tests of current intellectual functioning and achievement are nearly always involved. The MR diagnostic construct, as noted earlier, involves the dimensions of current intellectual functioning and adaptive behavior. Intellectual functioning is almost always assessed through the administration of individual IQ tests. The adaptive behavior domain, when it can be assessed formally, typically involves the results of one or more inventories in which the teacher, parent, or both serve as reporters on the child’s adaptive functioning. A general achievement test is almost always used with MR, as are other measures such as teacher- or parent-completed rating scales or checklists. However, the fundamental feature of MR eligibility determination is the IQ score, with confirming or supportive evidence from formal and informal measures of adaptive behavior.
The assessment procedures for ED have the same general characteristics as those for MR and LD. An individually administered IQ test and one or more standardized achievement tests almost always are included in the evaluation for ED eligibility. In addition, the ED evaluation should, and sometimes does, emphasize measures of behaviors across different social contexts, as well as assessment of social skills—including peer relations and interactions with significant adults. Formal rating scales that focus on key behavioral dimensions, such as aggression, attention, hyperactivity, and depression, are nearly always used along with direct observations in relevant settings and interviews with the student, the teacher, and the parents.
Depending on the region, students considered for the category of ED may or may not receive projective instruments, such as Rorschach, human figure drawings, and incomplete sentence techniques. Use of highly subjective projective approaches with dubious technical characteristics is more common in the states on the East and West coasts of the United States (Hosp and Reschly, 2002a). Although IQ and achievement tests are typically used with an ED assessment, the fundamental eligibility determination rests typically on reviews of behavioral incidents, social skills measures, and behavior/personality ratings completed by various respondents, who may include teachers, parents, and the student.
Because eligibility to receive services as a gifted or talented student is not regulated by federal statutes, the process is usually guided by state-level
policies that range from law, to rule, to guidelines, to administrative code. In some states, identification of gifted students is not mandated at all. And existing policies on identification stem from widely varying definitions of giftedness and include widely disparate requirements. In some states, local school districts are not required to use the state definition or state guidelines and recommended identification processes. This results in widely different proportions of identified students. In the report of the Council of State Directors of Gifted Education (1999), the percentage of total students identified in those states reporting this statistic ranged from 0.22 percent in Nevada to 22.9 percent in Maryland.1 In Massachusetts only 14 percent of local education agencies identify gifted students.
The age at which identification of gifted students begins is also determined at the state level. At least two states report that identification begins as early as pre-K (Council of State Directors of Programs for the Gifted, 1999) while 16 states simply recommend prekindergarten screening (Coleman and Gallagher, 1992). In some states, policies do not mandate identification until grade 4, and in some states the onset of the process is left to local discretion (Council of State Directors of Programs for the Gifted, 1999). The later the identification and screening process occurs, the less likely a student from a minority or low-income population is to be identified using criteria that rely heavily on academic achievement on standardized assessments. As Chapters 2 and 3 suggest, the pattern of lower achievement on standardized assessments of black, Hispanic, and Native American students is at least partially established by the beginning of kindergarten. By 4th grade, the percentages of whites scoring in the advanced range on the National Assessment of Educational Progress on reading, math, science, and writing were from two to five times as large as those for the under-represented minorities, with 0 percent of black and Hispanic students scoring in the advanced range in math, science, and writing (Donahue et al., 2001).
The discrepancies among definitions, policies, and implementation of policy at the local school district level result in considerable variation from school to school in the creation of a pool of identified gifted and talented students. Even when a particular definition has been adopted—for example, outstanding academic performance—the subjective judgment of what represents outstanding performance is influenced by the normative performance of students in a given school or school district.
Many of the suggested practices in the literature on identification of gifted and talented students mirror those suggested for the identification of students with disabilities. There is widespread agreement that assessment tools must validly and reliably measure the construct of giftedness, using separate and appropriate identification strategies to identify different aspects of giftedness, using multiple criteria for identification, and including criteria that are appropriate for underserved populations (Callahan et al., 1995). Current identification practice, however, does not widely adhere to these principles.
Perhaps the greatest challenge in determining the validity and reliability of an instrument to measure the construct of giftedness is defining the construct itself. There are some who believe that academic giftedness can be captured in the measure of general intellectual function, often referred to as “g,” that underlies all adaptive behavior (Sternberg, 1999; Jensen, 1998). Jensen (1998) explains: “the g factor reflects individual differences in information processing as manifested in functions such as attending, selecting, searching, internalizing, deciding, discriminating, generalizing, learning, remembering, and using incoming and past-acquired information to solve problems and cope with the exigencies of the environment” (p. 117).
The hypothesis of a unitary intelligence factor is generally supported with evidence that g underlies performance across a broad range of tests. However, this interpretation is challenged by those who argue that separate dimensions of intelligence are identifiable, and students who demonstrate exceptionality on one dimension often are unexceptional on others (e.g., Gagne, 1985; Gardner, 1983; Stanley, 1984). Sternberg (1997) considers that analytic, creative, and practical intelligence are three different, largely uncorrelated dimensions that are expressed as different abilities both inside and outside the classroom. Gardner (1999) argues for eight separate intelligences, although there are no predictive empirical data to support his argument. Benbow and Minor (1990) studied extremely precocious 13-year-old students and found that mathematical and verbal giftedness were entirely distinct. Moreover, within-individual discrepancies are typically much greater for high-ability than low-ability students (Detterman and Daniel, 1989).
As with disability determination, aligning assessment with the construct being measured provides just two legs of a table. The third leg required for functionality is alignment of the program or intervention. It does little good to broaden the definition of giftedness to include creativity, leadership, or musical ability if the program a school has to offer gifted students is advanced mathematics. Yet Callahan et al. (1995) found schools using intelligence tests to assess creativity and musical aptitude.
In their survey of school districts regarding identification of gifted and talented students, Callahan et al. (1995) found that despite contemporary understandings, most school divisions subscribed to the original federal definition found in the Marland report (U.S. Department of Health, Education, and Welfare, 1972). The construct of general intellectual ability in the federal definition of giftedness was the most frequently used construct guiding identification.
The initiation of the identification process in many localities is a call to teachers to simply nominate all students they believe to be gifted. In other school divisions, the initial consideration for gifted services may be initiated through a process of asking teachers to complete a checklist or rating scale on all students in the class or only those from the class judged to be gifted. In most cases, these nomination forms or checklists are based on a set of characteristics rather than on specific assessments of educational need. Widespread use of teacher judgment has been identified as a potential explanation for the disproportionate representation of minority students (other than Asians) as gifted and talented (e.g., Ford, 1995, 1996). Some scholars argue that teacher nominations are compromised by the generally low expectations that they hold for culturally and linguistically diverse learners (Clasen, 1994; Dusek and Joseph, 1983; Jones, 1988; McCarty et al., 1991) and their inability to recognize characteristics of giftedness when exhibited in nontraditional behaviors of minority children (Bermudez and Rakow, 1990). As discussed in Chapter 6, empirical findings regarding teacher bias in natural settings are inconclusive.
It is not uncommon for schools to also collect nominations of parents, but parents of Hispanic and black students tend to refer their children at lower rates than white parents (Colangelo, 1985; Scott et al., 1992; Woods and Achey, 1990). The potential significance of teacher and parent involvement was suggested by the results of a program designed to increase minority participation in gifted programs that was launched some time ago in Greensboro, North Carolina (Woods and Achey, 1990). Although the program study was done in 1986-1989, its findings are noteworthy. Identification for the gifted program in grades 2 through 5 relied on a combination of standardized test scores and parent, teacher, peer, or self-nomination. Once nominated, a student was eligible for up to three rounds of aptitude and achievement testing: the first two were group evaluations, and the third was an individual evaluation. After testing, parents were notified of the test results. A qualifying score would admit the child to the program. If the child scored below the cutoff point, the parent or a school committee could request retesting. Students were required to reach a cutoff score that com-
bined achievement test scores, aptitude test score, and (with a much lower weight) scholastic performance. The highest scores from the three rounds were used to determine eligibility.
The standards for determining giftedness were not altered in the program that targeted minority students. Rather, when a minority student was identified as at or above the 85th percentile on the school-wide standardized tests, the three-step evaluation was begun; no nomination was required. Two professionals were assigned to the program to administer individual tests, monitor test scores, track data, and ensure follow-through for the targeted students. Parents were notified after each round, but testing proceeded through the full battery unless there was a specific parent request to discontinue. Without altering standards for entry, the number of minority students in the gifted program increased by 181 percent, from 99 to 278 students. Minority students’ share of the gifted program increased from 13.2 to 27.5 percent. Only 15 percent of the minority students ultimately identified were identified on the first round of testing.
In some school systems, the referral pool comes from reviewing group-administered tests and selecting those who score above some predetermined score. Archambault et al. (1993) report that 79 percent of teachers in a national survey claimed that achievement tests are used to identify the gifted in their schools, 72 percent use IQ tests, and 70 percent use teacher nomination. Not surprisingly, intelligence tests were the most frequently cited tests for measuring the construct of general intellectual ability, with general reliance on group tests. Individual tests were most often used only for further data gathering in borderline cases.
School divisions with specific provisions for identifying gifted students from minority populations most often relied on traditional measures to accomplish this goal. Often the school districts listed individual intelligence tests as the vehicle for most effective identification of minority students. Screening by reviewing test scores is sometimes carried out in combination with teacher nominations, parent nominations, and/or peer nominations and sometimes as the sole source used in creating a screening pool.
The next step in the decision process resulting in classification as gifted is sometimes based on a single score derived from the tests or the teacher nominations used in the screening process. In other cases, identification may entail the collection of a specified range of data, including scores derived from group or individually administered ability or achievement tests, creativity tests, teacher ratings, portfolio reviews, or interview data. While in 34 policy statements states recommended the use of multiple criteria for identification (Coleman and Gallagher, 1992), a survey of schools in 50 states found only very limited applications of this principle in practice (Patton et al., 1990).
The data that are collected in the assessment process may be reviewed by a team of educators, or students may be identified by the entry of the scores on a matrix that assigns arbitrary weights to particular scores or ratings; it assigns a prescribed number of the highest-scoring students or students meeting a preassigned total score to the gifted program. The use of such matrices has been criticized for presenting the illusion of being more culturally unbiased while, in truth, using a procedure that still gives greatest weight to the scores with most variability—test scores (Callahan and McIntire, 1994). In other identification procedures used in schools, students whose scores or other characteristics meet a set of prescribed criteria on the indicators are selected. Finally, a case study approach may be used, in which the data are used to describe educational needs and to assign program and curricular modifications.
The underrepresentation of American Indian/Alaskan Native, black, and Hispanic students in gifted and talented programs was reviewed in Chapter 2, using data from the periodic surveys of school districts conducted by the Office for Civil Rights. Concern with disproportionality is reflected in the federal Jacob K. Javits Gifted and Talented Students Education Act of 1988, which gave highest priority to “the identification of gifted and talented students who may not be identified through traditional assessment methods (including economically disadvantaged individuals, individuals of limited English proficiency, and individuals with handicaps” (p. 238). It is also reflected in court cases (e.g., Coalition to Save Our Children v. State Board of Education, 1995): racial discrimination has been the focus of suits brought against local school districts, often as a component of more general charges of discrimination within a school division (Karnes and Marquardt, 2000).
Coleman and Gallagher (1992) reported that 38 state policies make some reference to issues of identifying gifted students from “culturally diverse populations, economically disadvantaged students and disabled students” (p. 11). In some states, there are specific guidelines for selecting tests or carrying out the identification process to help schools identify greater numbers of minority students; in other states, specific instruments are recommended (e.g., the Raven’s Progressive Matrices Test, the Matrix Analogies Test, the Torrance Test of Creative Thinking). These tests emphasize reasoning ability or “fluid” intelligence and de-emphasize information acquisition or “crystallized knowledge,” which is likely to be more culturally specific.
The Raven’s Progressive Matrices, for example, assess nonverbal, ab-
stract reasoning by having students select which pattern pieces fit best into an overall array or matrix. While the cultural neutrality of the abstract patterns is appealing, its usefulness for gifted and talented identification has not been fully tested. One study found that scores on the test were not related to school performance (Mills et al., 1993), and it does less well at predicting academic achievement than most intelligence tests or specific ability measures (Baska, 1986; Raven, 1990). This does not suggest that the Raven’s is less able to identify exceptional ability; it may be that the students who score well are exceptional in respects not well tapped by school programs. The validity of these alternative methods and their effects on disproportionality are largely unknown, although nonminority and high-income students tend to perform better than their minority, low-income counterparts (Mills et al., 1993).
Noting that state policies have not resulted in uniform adoption of procedures effective in increasing identification of low-income or minority students, Coleman and Gallagher (1992) investigated the factors that inhibited the adoption and implementation at the local level of more flexible and permissive identification policies. They found two major constraints to implementation. The first was a fear that increased numbers of identified students would not be accompanied by an increase in financial resources, and the second was a fear of legal suits that would be filed by parents whose children might have higher test scores but were not selected for the programs (reverse discrimination suits).
While the literature is replete with suggestions for increasing the numbers of black, Hispanic, American Indian/Alaskan Native students, it is much more limited in the documentation of success of alternative strategies in recruiting and retaining such students in gifted and talented programs. However, several innovative efforts have been documented.
One model focused on interactive staff development using core attributes of intellectual giftedness, corresponding observable behaviors (as they might be manifest in low-income and minority populations), and group decision making using multiple assessment tools. It was successful in generating greatly increased numbers of teacher nominations and subsequent identification as gifted (Frasier et al., 1995). A complex system (described by the authors as labor-intensive and time-consuming) using classroom observation, multicultural curriculum-based enrichment activities, standardized assessments, portfolio assessments, teacher nominations for screening and a dynamic assessment tool, literature-based performance assessment, standardized tests, and child interviews demonstrated that academically gifted students could be found “even in the most beleaguered schools” (Borland and Wright, 1994:170). A comprehensive screening of kindergarten children in an urban environment increased the identified 1st grade
students in that school division from 0.2 percent to 2 percent (Feiring et al., 1997).
All the approaches noted above focused on identification and traditional conceptions of giftedness. Other nontraditional strategies with promise for identification are based on alternative conceptions of intellectual ability and include studies of the effects of curricular adaptations on identified students. One strategy, based on adopting an alternative conception of giftedness derived from Howard Gardner’s model of intellectual functioning and employing a set of performance assessment tasks, provided evidence that minority or economically disadvantaged students selected using this model during kindergarten and provided with systematic curricular intervention were more likely to be selected for programs for the gifted in 3rd grade (Callahan et al., 1995). Students at the high school level identified using Sternberg’s triarchic conception of intelligence and students who were instructed using strategies that matched their patterns of identified areas of strength performed better than students who were mismatched across a broad range of assessments (Sternberg et al., 1996).
A theme that runs through this chapter and, indeed, through the entire report warrants repeating here: addressing disproportion is far more complex than changing the participation numbers by adopting assessment tools that will identify a different racial/ethnic mix of students. The goal must be to better serve the educational needs of all students. Success in that endeavor will depend first on the alignment of program interventions to the educational needs of students, and only then on crafting better assessment tools and procedures. While the tools must be valid, reliable, and culturally unbiased, they must also effectively identify those students who need and can profit from the interventions made available at the school. Certainly as the needs of atypical learners are better understood, the interventions we design may, and should, change. Assessment practices must then evolve to serve the purpose of linking student need to program intervention.
The research base that highlights the challenge of designing and administering assessments for students from very different cultures and socioeconomic backgrounds suggests, however, that persistent attention to the ability of the assessment tool to reliably identify educational need is warranted. In the next chapter we look at the major challenges to current assessment practices in this regard, and at alternatives that in the committee’s view would better serve the end of linking educational need to special and gifted program interventions.