With this chapter, we turn to a detailed consideration of one of the key concepts in this report: scale. As such, this chapter has two goals. The first goal is to put forth a multidimensional framework for conceptualizing scale, adapted from Coburn (2003), and which informs the chapters that follow. This framework articulates four dimensions of scale: spread, depth, sustainability, and shift in ownership. As discussed above in Chapter 1 and elaborated more extensively below, most assessments of scale focus primarily on the spread of the innovation, yet considerations of the depth of implementation of the innovation, sustainability, and shifts in ownership are critical to making sense of why an innovation may (or may not) successfully scale (Coburn, 2003).
The second goal of this chapter is to discuss, in broad strokes, affordances and challenges of approaches to scaling innovations with respect to the four dimensions of scale, and for which there is empirical evidence demonstrating their impact. The committee discusses (a) pathways to scaling an existing innovation in new contexts in which implementation is tightly prescribed (i.e., a focus on fidelity of implementation), and (b) pathways where principled adaptation of the innovation is expected and desired as it is implemented in new contexts (i.e., a focus on integrity of implementation). The committee then discusses (c) approaches in which researchers and educators (and sometimes youth, family, and/or community members) collaboratively design innovations and their implementation, with attention to scale and sustainability. The committee offers a discussion of the affordances and limitations of each approach.
Some notes on terminology support this discussion. As described in Chapter 1, each innovation involves interested and affected parties, among which the committee distinguishes: designers (those who design the innovation); enactors (those who directly enact the innovation); enablers (those who support the enactors, through funding, providing leadership and/or professional development, etc.); and beneficiaries (those whose interests the innovation is meant to directly serve). As discussed further below, some persons play more than one role in the initial design, enactment, and/or scaling of an innovation. The committee refers to the potential of an innovation to be “scaled” as its “scalability.” In what follows, “scalability” will be distinguished from whether “scale” (or particular dimensions of scale) has been “achieved” in relation to a given innovation. Of course, what counts as “achieving” scale is always relative to the intended goals of an innovation, the intended beneficiaries, and the context within which it is enacted, as well as a given time period.
In addition, although the committee decided that it is important to conceptualize sustainability as a dimension of scale, throughout the report, the language of “scale and sustainability” is used. This choice of language is deliberate. One reason is that the language of “sustained implementation” is in the committee’s task statement. The second reason is that, in the everyday language of policy and practice, assessments of whether an innovation has “scaled” are often made absent attention to whether the innovation is sustained. Bringing attention to “scale and sustainability” is intended to underscore the importance of considering not just the spread, or reach, of an innovation, but also its lifespan, especially given trends of educator turnover in educational contexts.
The chapter begins with a discussion of the four dimensions of scale, as adapted from Coburn, that inform our analysis in the chapters that follow. We then turn to three approaches to scaling up—fidelity of implementation; principled adaptation; and collaborative design and implementation—and examine the affordances and challenges of each. Throughout this review of implementation approaches, we also discuss a common assessment sequence and how it unfolds when applied to each approach.
The term “scale” means many things to many people. People who design and implement innovations, including educators and researchers, as well as policymakers and members of the science, technology, engineering, and mathematics (STEM) professions, conceptualize scale in diverse ways, and use various criteria to assess whether, and in what ways, an innovation has “scaled” (e.g., Morel et al., 2019). Most discussions of scale focus solely on increasing the number of enactors, beneficiaries, and/or contexts
in which an innovation is enacted (Coburn, 2003). As Coburn (2003) cautioned, however, focusing only on increasing numbers “says nothing about the nature of the change envisioned or the degree to which schools and teachers have the knowledge and authority to continue to grow the [innovation] over time” (p. 4). Moreover, focusing on numbers alone in an assessment of scale often precludes researchers and educators from attending to critical factors that may explain whether an innovation scales up successfully, and why or why not.
Adapting Coburn’s (2003) framework for conceptualizing scale along four “interrelated dimensions” (p. 3) has allowed the committee to bring a fuller sense of scale to the issues above. In addition to the spread of the innovation, the committee views the three other dimensions articulated by Coburn—depth of change to enactors’ current beliefs, knowledge, and practice, the sustainability of the innovation, and ownership of the innovation—as critical dimensions to consider in efforts to “scale” an innovation. Not all efforts to scale an innovation will explicitly attend to each of these dimensions, nor will they give equal weight or priority to the dimensions on which they focus. As detailed in Chapter 8, conceptualizing scale along these dimensions has implications for the identification of critical conditions that enable or act as barriers to the scalability of an innovation and how to assess or measure the extent to which an innovation has “scaled.”
In conceptualizing these dimensions, Coburn (2003) focused on the scaling of classroom instructional reforms.1 As such, implementation and scaling of an innovation was primarily conceived of in terms of classrooms, as embedded in schools, as embedded in school districts. The focus of this committee is not limited to classroom instructional innovations; we consider Pre-K–12 classroom, school, and district innovations, as well as innovations focused on afterschool and community settings, and innovations intended to scale at the state level and across regions. Spread, depth, sustainability, and ownership are important to consider, in designing for and assessing scale in these alternative contexts as well.
Typically, in the literature, “spread” refers to the expansion or diffusion of an innovation, to more contexts and/or larger numbers of participants. As noted above, spread is the most common way of conceptualizing scale. Traditionally, scale was identified with spread, expressed as a number—the number of sites (schools, districts) to which the innovation has expanded
___________________
1 The committee recognizes that not all instructional reforms are innovations; however, there are a number that are designed that would meet the definition of “innovation” provided in Chapter 1. The committee strives to delineate these concepts as much as possible.
(geographic spread); or the number of people (teachers, students) affected (demographic spread; Coburn, 2003, p. 7). Demographic spread might also reflect increasing diversity of the affected population, even in the absence of geographic spread.
In setting goals for the spread of an innovation, and when assessing spread, it is critical to consider the specific populations and communities for which an innovation is designed and appropriate. For instance, Advancing Indigenous People in STEM (AISES) is a nonprofit indigenous organization “founded in 1977 by American Indian scientists, engineers, and educators” that aims to substantially increase “the representation of Indigenous peoples in STEM studies and careers.”2 They provide, among other supports, culturally relevant curriculum programs for Pre-K–12 Indigenous students in computer science and coding, and mentorship and networking opportunities for Indigenous college and university students. AISES organizes its activities via regionally based chapters across the United States and North America, and is guided by eight “Advisory Councils,” which include Tribal Nations Advisory Councils, a Council of Elders, as well as Councils of Indigenous Peoples in Industry. In relation to spread, then, AISES focuses on reaching a particular population, whereas other STEM-focused organizations might focus on different or broader populations. As such, goals for spread and assessing progress of spread will necessarily vary.
Of course, an exclusive focus on numbers of sites and/or people says nothing about the nature of the innovation (what is spread), whether the “what” is implemented in a superficial or deep manner, or whether the innovation is sustained, especially as enablers and enactors come and go. These critical issues are the focus of dimensions of scale discussed below: depth, sustainability, and ownership.
A second, critical dimension of scale concerns the nature of change to the enactor’s current practice that is intended or implied by the innovation, and the depth of the change that is implemented. The implementation and scaling of some innovations aim at integrating minor changes into educators’ and learners’ existing practice, whereas others entail substantial change to educators’ and learners’ usual practice. As Coburn (2003) argues, getting clear on the nature and the quality of the change that is intended matters in being able to both support and assess the scaling of the innovation.
Innovations requiring minor changes to educators’ current practices might include, for example, the provision of a new platform students can use to practice math skills. Implementing the platform may require some
___________________
technology “know-how” but it likely does not entail substantial change to teachers’ current beliefs about students, or assumptions about teaching and learning. For example, a classroom teacher may decide to incorporate an online tool for math homework assignments and formative assessments that is compatible with curriculum already in use and can be easily integrated with the school’s existing learning management system (e.g., ASSISTments3).
In contrast, implementing innovations that “[go] beyond surface structures or procedures” typically necessitate “deep change” to educators’ assumptions about teaching, learning, and subject matter, norms of interaction between teachers and students, and pedagogical principles (Coburn, 2003, p. 4; see also Brown, 1992). For example, consider the implementation of a new curriculum designed to engage students to use science and engineering concepts to make sense of scientific phenomena. In many contexts, for many science teachers, learning to engage students in phenomena-focused learning requires substantial shifts in how they currently teach, and it requires new content as well as pedagogical knowledge regarding the integration of scientific, mathematical, and engineering concepts (Reiser et al., 2017). It also likely requires new forms of collaboration among science teachers and teacher educators (Reiser et al., 2017).
The committee considers depth in terms of the extent to which the innovation is intended to create or entails substantial shifts to what Elmore (1996) terms the “core of educational practice” (p. 2). When assessing depth, it is important to distinguish the depth of intended change from the depth of achieved change. For example, in cases where the intended change requires substantial shifts in current educators’ current assumptions, knowledge, and practices, how “deep” is the implementation? In enacting the innovation, have educators shifted longstanding beliefs about their learners? Have they deepened their knowledge of and practice in eliciting and responding to students’ current ideas and reasoning, in order to advance the learning of each student? Or, have educators enacted features of the innovation in superficial ways, where, for example, materials or “sentence stems” may be visible in instruction, but the focus of discourse remains on “the right format,” rather than the substance of what students are reasoning about?
Coburn (2003) anticipated that “some may argue that [. . .] components of depth are more appropriate for principle-based reforms” than those that are more narrowly focused on discrete aspects of teaching or learning (p. 5). Coburn cautioned, however, that “most [innovations], even those that are not explicitly principle-based, ‘carry’ sets of ideas about what constitutes appropriate [teaching and learning],” for example, about the “nature of subject matter, valued student outcomes, how students
___________________
learn” and so forth (p. 5). Therefore, even in cases in which the intended shift in enactors’ current practices appear minor, it is important to attend to the depth with which the innovation is enacted, when assessing scale. Attention to depth, no matter the nature of the intended change, highlights how critical it is to consider not just whether new materials, or a new pedagogy, or new tools are present in a setting, but how they are being used as the innovation is implemented and scaled.
Attention to depth also has important implications for the study of scaling STEM innovations. Although studies of scale often include some assessment regarding the implementation of innovations, oftentimes those assessment focus on surface-level indications of adoption (e.g., number of users, presence of materials, time spent using the materials; Coburn, 2003). Conceptualizing depth as critical to understanding and assessing scale implies a need to both develop and use measures that attend to educators’ assumptions about STEM teaching and learning as well as students’ capabilities, content-specific knowledge, and their developing forms of practice. As Coburn (2003) argues, “The increased emphasis on depth as a key element of scale calls into question the degree to which [. . .] implementation can be assessed using survey methods alone” (p. 5).
A third, interrelated dimension of scale concerns the sustainability of innovations, that is, whether the innovation endures over time in the original and new contexts when the initial circumstances, such as funding, run their course (Coburn et al., 2012; Datnow, Hubbard, & Mehan, 2002; Scheirer, 2005). Coburn (2003) underscored, “Most discussions address issues of sustainability and scale separately, obscuring the way that scale, in fact, depends upon sustainability” (p. 6). Sustaining an innovation requires attention to spread and depth of change in the practice of both those who have been there for a long period of time and those who are new, while being responsive to changes in the contexts in which the innovation is enacted (McLaughlin & Mitra, 2001).
Gersten and colleagues (2000) bring attention to how considerations regarding the sustainability of innovations will vary, depending, in part, on the nature of change implied by the innovation. For example, for innovations that target substantial changes in the instructional core, teachers’ willingness to consider new pedagogical approaches is a critical consideration for sustainability. In contrast, innovations that focus on structural changes to schooling (e.g., changes in how often courses meet) may require specific considerations related to the school culture, as well as district and state policies.
Although sustainability of innovations is fundamental to scale, few studies explicitly address it in their conceptualizations or research designs to examine scale (Coburn et al., 2012; Gersten, Chard, & Baker, 2000; Hargreaves & Goodson, 2006). Most studies related to scale-up efforts of innovations focus on the first few years of implementation and fail to attend to their sustainability after the implementation period, when start-up resources and supports have officially ended (Coburn, 2003). Part of the challenge is that sustainability is about what occurs after the work on the design and initial implementation of an innovation have been completed. Often, research funding for that work has ended, and researchers move on to a different set of problems (Fishman et al., 2011). Few studies have attempted to interrogate the sustainability of the program after initial scale-up efforts (though, see Zoblotsky et al., 2017 for a three-year follow-up study of the LASER program described in Chapter 3).
Whereas many studies fail to attend to the sustainability of innovations, there is growing attention by funders and enactors of various STEM education innovations to what happens after the initial funding for the innovations end. For example, Fishman and colleagues (2011) followed up with teachers after their participation in a randomized trial where they were introduced to a technology-infused mathematics curriculum. Via a teacher survey, Fishman et al. explored factors related to the continued use of the curriculum in ways that were congruent with the developers’ intent. Their findings revealed disparities in terms of the settings in which the use of the curriculum was sustained. Namely, the higher the socioeconomic status of student populations, and the greater the students’ performance with respect to conceptually rich mathematics prior to using these materials, the more likely teachers were to continue using the materials. Teachers’ perceptions of coherence, or “how well the professional development matched the teacher’s goals for professional development, the existing reform ideas within the school, and whether the professional development was followed up with activities that built upon what was already learned,” were also found to be related to sustained use of materials (p. 341). These findings suggest that sustaining innovations requires ongoing attention to enactors’ assumptions and current practices (i.e., depth of change), issues of equity in implementation, and how an innovation can be coherently integrated into the broader system within which it is intended to work.
There is ample evidence showing that sustaining innovations is challenging due to various factors such as competing priorities in school and districts, limited resources, and teachers’ social relations and turnover (e.g., Bryk et al., 2010; Gersten, Chard, & Baker, 2000; Klingner et al., 1999). As stated by Datnow and colleagues (2002), “Forces at the state, in districts, design teams, the school and classrooms all interact to shape the longevity
of reform” (p. 135). Based on five years of research in schools that engaged in one of three theory-based innovations, McLaughlin and Mitra (2001) identified five essential factors that shape sustainability. These include (a) access to sufficient resources, (b) knowledge of the first principles of the innovation, (c) a supportive community of practice, (d) a knowledgeable and supportive principal, and (e) a compatible district context. All in all, these studies demonstrate how sustainability of innovations maybe a central challenge to bring innovations to scale. We return to these points in Chapter 8.
A fourth dimension of scale includes attention to the broadening and deepening of ownership of the innovation. As an innovation is implemented and scaled in a particular context, there is typically a set of enablers, who are tasked with guiding the implementation effort. Oftentimes, the enablers of the innovation are distinct from the original designers. For example, district leaders may be tasked with implementing a new STEM curriculum that was designed and tested in distant districts. In some cases, the initial designers or “authors” of the innovation may be centrally involved in implementation; this is especially true in cases where the innovation was designed by members of the community itself or in cases in which local educators and researchers co-designed the innovation in the local context. In either case, a critical component of taking an innovation to scale entails creating conditions to deepen, expand, and/or shift knowledge and authority of the innovation. Broad, local ownership among enactors supports deep implementation and sustainability of an innovation. Enactors can anticipate challenges to implementation and inform contextually specific decision making regarding what would support deep implementation. Broad ownership among enactors can result in a professional community that, in turn, invites and supports new “users” (McLaughlin & Mitra, 2001; Rogers et al., 2009).
Coburn (2003) cautioned that discussions of ownership often focus on “buy-in” rather than “a shift in knowledge of and authority for the reform” (p. 8), especially in relation to initial adoption of an innovation of instructional materials or a technology. Instead of focusing on “teachers’ buy-in,” it is more productive, long term, to foster teachers’ ownership of the instructional sequences to enable them to take increasing responsibility in improving the sequences, based on enactment in their classrooms (Cobb & Jackson, 2015). It is rare to find explicit discussion of shift in ownership in research on the implementation and scaling of innovations. Yet, there are studies that provide insights into what this transition in ownership might entail. For example, McLaughlin and Mitra (2001) analyzed the sustainability and spread (and challenges therein) of three classroom innovation
efforts that were initially designed by university researchers; each of the innovations targeted deep change in classroom practice.4 Given that the innovations were initiated “from the outside,” questions of ownership were critical, particularly once the funding and university-based support ended. In one of the projects, attention to the transfer of ownership was an explicit focus of the last year of the funded project. University researchers engaged in multiple intentional conversations with school and district leadership to discuss and plan how leaders and teachers could continue to deepen an understanding of the principles of the reform absent the presence of the university team; their goal was that “teachers [could] develop the confidence to make their own decisions related to project activities” (p. 320).
As another example, consider the design and scaling of the National Science Foundation (NSF)-funded River City “multi-user virtual environment curriculum designed to enhance engagement and learning in middle school science” (Clarke & Dede, 2009, p. 354).5 Drawing on design-based research methods, Clarke and Dede (2009) detailed how the research team intentionally designed for spread, depth of implementation, sustainability, and shift in ownership. Specific to shift in ownership, Clarke and Dede underscored the importance of strong relationships with teachers, who they involved as “co-evaluators” and “co-designers” throughout their multi-year design, implementation, and scaling efforts (p. 362). For example, teachers who had implemented River City in a prior phase supported the onboarding of new teachers, in new sites. In addition, the research team continued to elicit feedback from implementers, new and old, to inform changes and elaborations to the curriculum.
As part of designing for sustainability and shift in ownership, Dede (2006) suggests the evolution of the innovation as a key dimension of scale. Evolution refers to
when the adopters of an innovation revise it and adapt it in such a way that it is influential in reshaping the thinking of its designers. This in turn creates a community of practice between adopters and designers whereby the evolution evolves. [. . .] Evolution is more than providing teachers with ownership; it is incorporating their ownership into the evolution of the curriculum. Evolution is really a product of depth, spread, and shift. (Clarke & Dede, 2009, p. 354)
As an example of evolution, Clarke and Dede detail how their team’s thinking about the design and facilitation of professional development evolved, as ownership of the innovation shifted from a “train the trainer
___________________
4 The Jasper component included a math focus, the Fostering Communities of Learners is about literacy, discourse, and metacognitive reasoning as applied to many different topics, and the CSILE component could involve various content. Therefore, the degree to which the components focused on STEM varied.
model” to monthly webinars focused on “just in time training.” Writing primarily from the perspective of researchers designing innovations for scalability, Clarke and Dede (2009; see also Dede, 2006) proposed the addition of “evolution” of the innovation as a fifth dimension to Coburn’s (2003) four dimensions of scale (spread, depth, sustainability, ownership). The committee elected to treat attention to the evolution of the innovation as an important aspect of fostering sustainability and shift in ownership, given the broad scope of innovations and origins of their development within the committee’s charge.
These four dimensions of scale—spread, depth, sustainability, and ownership—are not independent (Coburn, 2003). Indeed, they are intertwined in complex ways. If the innovation involves superficial or minor changes to enactors’ current practice, its adoption and assimilation by practitioners may perhaps be easily achieved. That said, superficial change is inherently fragile. If, on the other hand, the innovation entails substantial change to enactors’ current practices, then the depth of implementation and expansion of ownership among enactors is a formidable challenge that demands time and substantial professional resources to accomplish. It is a change of culture. But, once achieved, it provides a foundation for sustainability, at least in the particular context for which the innovation was designed.
In Elmore’s (1996) view, depth is the enemy of spread, at least of geographic spread to new contexts. Elmore writes: “The closer an innovation gets to the core of schooling, the less likely it is that it will influence teaching and learning on a large scale” (p. 4). As elaborated in Chapter 7, this is because sustaining innovations that entail substantial changes to “business as usual” entails concerted and coordinated changes in the broader educational system (e.g., leadership, accountability relations). As Coburn (2003) stated, “[T]he more ambitious [an innovation], the more challenging it may be to simultaneously achieve spread, sustainability, and depth” (p. 9).
Just as there are multiple meanings of “scale,” there are various ways by which enactors and researchers go about “scaling” or “scaling up” an innovation (e.g., Elmore, 1996; Morel et al., 2019). “Scaling an innovation” typically refers to how various actors plan for, work to achieve, and assess the implementation of innovations in broader contexts and/or with broader populations, than those for which it was initially designed and enacted (Dede, 2006). In some cases, the innovation’s designers are researchers who
may or may not be members of the communities in which the innovation is being implemented and scaled. Further, researchers who study the scaling of the innovation may or may not be the initial designers of the innovation. More recently, there has been growing attention to the value of educators (and sometimes youth, families, and community members) and researchers co-designing innovations in local contexts, with intentional consideration of implementation, scale, and sustainability as part of the innovation’s initial design (e.g., Bryk et al., 2015; Penuel et al., 2011; Peurach et al., 2022; Roschelle, Mazziotti, & Means, 2021).
In what follows, the committee presents an overview, in broad strokes, of three approaches to scaling innovations: (a) focus on fidelity of implementation: scaling innovations in which implementation is tightly prescribed (i.e., a focus on fidelity of implementation), (b) principled adaptation: scaling innovations, where principled adaptation of the innovation is expected and desired (i.e., a focus on integrity of implementation), and (c) collaborative design: approaches in which researchers and educators collaboratively design innovations and their implementation, with deliberate attention to scale and sustainability. (Co-design approaches tend to reflect adaptation approaches to scaling.) Throughout, the committee discusses the affordances and challenges with the different approaches, with respect to concerns for spread, depth, sustainability, and ownership.
Two points are useful to keep in mind. First, approaches to scaling an innovation “may shift over time within the lifecycle of an innovation” (Morel et al., 2019, p. 370). For example, as the innovation is spread to new populations and new contexts, the various interested and affected parties shift, which often necessitates changes in who is involved in the scaling process and how it is being approached. Second, this discussion of approaches to scaling innovations is not exhaustive. For example, Morel and colleagues (2019) describes reinvention as another approach to scaling, in which researchers and designers “expect that innovations undergo radical transformation” as they are used in various contexts, for example open-source digital media platforms (p. 372). The committee did not report on reinvention, given difficulty in finding empirical evidence of the impact of STEM innovations reflecting such an approach to scale.
One approach to scaling an innovation reflects the assumption that to claim an innovation has successfully scaled, enactors must implement the innovation in tightly prescribed ways, in contexts other than those in which it was initially designed and effective in service of a specific set of learning goals for the beneficiaries. Morel and colleagues (2019) described this approach to scale as replication: “an innovation is considered at scale if it
is widespread, implemented with fidelity, and produces expected outcomes” (p. 371).
The concept of fidelity of implementation originated in public health and has been used in educational evaluation and intervention research for four decades or so (Gage, MacSuga-Gage, & Detrich, 2020; O’Donnell, 2008). Gage and colleagues (2020) described “fidelity of implementation” through an analogy from the medical field:
A patient is diagnosed with strep throat and his or her doctor prescribes an antibiotic. The instructions inform the patient to take one pill twice daily for 10 days. The patient takes both doses of antibiotic on the first day, but only one on the second and third days. Remembering again, the patient takes two doses on the fourth and fifth days, but then stops taking the antibiotic. By the end of 10 days, the patient returns to the doctor, complaining that the medicine did not work. Upon review, the doctor discovers that the patient only took 8 of the 20 prescribed antibiotic pills, or, put differently, the patient implemented the intervention with 40% fidelity. The reason for the patient’s lack of improvement was not that the intervention failed but that it was not implemented as prescribed. (p. 1)
“Fidelity” has been defined in different ways, and it is multidimensional (Gage, MacSuga, & Detrich, 2020; van Dijk, Lane, & Gage, 2023). Drawing on earlier work by Dane and Schneider (1998), van Dijk and colleagues (2023) defined key features of fidelity of implementation, which include (a) adherence (i.e., the degree to which all elements of an innovation were implemented); (b) dosage (i.e., time and/or frequency of use); (c) quality (i.e., how well the aspects of innovation were delivered); (d) differentiation (i.e., how the innovation is distinct from another condition); and (e) responsiveness (i.e., how the students respond to intervention). Attention to each of these dimensions is not essential for an intervention to be considered as implemented with fidelity. Rather, decisions about what features to attend to in conceptualizing and assessing fidelity of implementation need to be made in relation to an explicit program theory of change and improvement (O’Donnell, 2008).
By and large, federal agencies have encouraged studies of scaling that aim for “tight” implementation of an innovation in increasingly varied contexts, sequenced as follows (Roschelle, Mazziotti, & Means, 2021). Pilot studies focus on the “feasibility of an intervention that has [. . .] been implemented” and evaluated “under highly controlled conditions”, and aim to answer the question, “Is the theory undergirding the intervention effective?” (Thomas et al., 2018, p. 319). Efficacy studies “[assess] the extent to which a successfully piloted intervention produces the desired outcomes under less controlled conditions,” but where the designer is still “actively involved in implementing and evaluating the program” (Thomas et al., 2018, p. 319; see also O’Donnell, 2008). As described by Thomas and colleagues (2018), the guiding question is
“Can the intervention work outside of the laboratory or a few carefully selected sites under conditions that enable high-quality implementation?” (p. 319). Once an innovation has been shown to result in the desired outcomes, a next stage is to study the effectiveness of the innovation in a wider range of settings, where the designer is less involved in the implementation. The guiding question of effectiveness studies is: “Does the program work under conditions that approximate those under which the intervention would be delivered on a broader scale?” (Thomas et al., 2018, pp. 319–320). Assuming the implementation of the innovation continues to result in the desired outcomes, scale-up studies focus on the implementation of the innovation in new contexts, typically with increasing numbers of and heterogeneity of participants and/or sites (e.g., at a state level; Roschelle, Mazziotti, & Means, 2021; Thomas et al., 2018). See Box 4-1 for an example of an innovation that was tested and evaluated via a sequence of pilot, efficacy, effectiveness, and scale-up studies.
Conceptualizing and assessing fidelity of implementation is emphasized in each of these types of studies. When paired with an explicit theory of change, fidelity of implementation can help researchers identify faults in the innovation as designed (i.e., in the theory of change and its assumptions), and how variation in implementation impacts outcomes, for better and worse. Especially for innovations that target deep change to the instructional core, however, the relationship between “fidelity” and outcomes is often not clear in efficacy, effectiveness, and scale-up studies (O’Donnell, 2008; Penuel & Fishman, 2012). Teaching and learning take part in complex systems (e.g., classroom, school, district), and it is often difficult to “distinguish between the effects caused by the materials and the effects caused by the teachers’ interactions with the materials” (O’Donnell, 2008, p. 44).
Moreover, it is easier to assess fidelity of implementation in relation to highly prescribed innovations, in that the boundaries of what counts as “low” and “high” implementation, and its impact on student outcomes, can, in theory, be distinguished. Yet, highly prescribed innovations are often much harder to “fit” into existing classroom, school, and district environments.
Highly scripted, packaged programs provide a means to control implementation—which is ideal for teasing apart causality—but these can lead to an entire intervention being discarded when it does not fit well into the school environment. This creates an inherent tension between implementation and usefulness. The interventions most implementable with fidelity are heavily scripted and require specific supports, yet these requirements may not be feasible or desirable in many school environments (Coburn, 2003). (NASEM, 2022, pp. 67–68)
In addition, as highly prescribed innovations that have originated from “outside” the system are implemented and spread, attention to local resources
The Pre-K Mathematics supplementary mathematics curriculum for Pre-K children, “especially those from families experiencing economic hardship” is an example of an innovation that was tested and evaluated via a sequence of pilot, efficacy, effectiveness, and scale-up studies (Thomas et al., 2018, p. 328). Across the sequence of studies, the curriculum was implemented in increasingly diverse contexts, with positive effects on mathematics achievement (What Works Clearinghouse, 2023). The scale-up study took place across the state of California, “in public pre-K and Head Start programs in urban, suburban, and rural areas with large proportions of low-income families from diverse racial/ethnic backgrounds” (p. 330). Although the curriculum remained fairly stable across the studies, as reported in Thomas et al. (2018), the curriculum designers intentionally sought to learn what they might improve about the curriculum from each of the studies. For example, the designers learned that some activities took too long for the allocated time period, so “they altered the curriculum schedule” accordingly (p. 330). The designers predicted “that the quality of program delivery is likely to suffer at larger scales, due to less developer control over the implementation of the intervention” (Thomas et al., 2018, p. 331). Across the sequence of studies, in some sites, the curriculum’s project staff “trained teachers and coached them,” whereas in other sites they used a “train-the-trainer” model (p. 331). In the Statewide Scale-Up Study, fidelity of implementation was assessed as follows: a “local trainer” observed the teacher leading a small-group math activity and provided feedback about “any departures from fidelity” (p. 337). In addition, the trainer used a classroom observation tool to “record the type of mathematical content, number of children present, and the duration of the activity” (p. 337).
Overall, the Statewide Scale-Up Study resulted in positive impacts on children’s mathematics achievement that were consistent with those found in earlier studies; and the researchers found that “these effects did not demonstrably differ by the racial/ethnic background or pretest performance of children, or by the urbanicity of the settings” (p. 349). The researchers, however, also found that “scale-up is associated with smaller effect sizes—as studies got larger, more heterogeneous, less controlled, and used less well-aligned outcome measures, effect sizes tended to decrease” (p. 349). As a result, it is valuable to pair attention to implementation as part of the initial design work.
and expertise in a system is often minimal—which makes ownership and sustainability difficult. Further, the expectation of fidelity of implementation, especially in the earlier stages with limited contexts and participants, means that “implementation is highly monitored so that high fidelity is achieved” (NASEM, 2022, p. 78). As a result, challenges of implementation that are important for deep implementation of the innovation in additional contexts, as well as sustainability, “are not discovered until later studies with larger samples (Farley-Ripple et al., 2018; Finnigan & Daly, 2016)” (NASEM, 2022, p. 79).
As Penuel and Fishman (2012) argue, “an overemphasis on fidelity means giving less consideration to the ways that curriculum developers and professional development leaders could focus their efforts on helping teachers make productive adaptations of materials by being responsive to students” (p. 284).
As indicated above, until recently, federal agencies’ grant funding structures for the design, study, and scale of STEM teaching and learning innovations have by and large reflected a phased logic, from pilot to efficacy to effectiveness to scale-up, paired with a focus on “tight” fidelity of implementation throughout the phases (cf. Penuel & Fishman, 2012). The National Academies’s (2022) report, The Future of Education Research at IES: Advancing an Equity-Oriented Science, however, argues for the need to revise this logic. As discussed in subsequent chapters, the report calls for researchers to focus on “adaptation” of the intervention in “heterogenous environments that an intervention may be implemented within” (NASEM, 2022, p. 79). Careful study of adaptation in heterogenous environments from the start will allow researchers to “determin[e] barriers and facilitators (Tabak et al., 2012) and effective implementation strategies” that can support the scaling and sustainability of the intervention in additional contexts (NASEM, 2022, p. 79). Accordingly, the report calls for Institute of Education Sciences (IES), in particular, to shift from “Design and Development” studies to “Development and Adaptation” studies and to shift from “Efficacy” (i.e., replication) and “Effectiveness” studies to “Impact and Heterogeneity” studies. These suggested shifts reflect the value of making implementation of innovations, in relation to student learning outcomes, worthy of study in its own right:
[. . . T]he success of interventions is driven in large part by their implementation. It is also clear that understanding implementation needs to go beyond simply determining if a given intervention is implemented with fidelity. Rather, there is increasing recognition that the process of implementation itself is worthy of study if education research is to provide sufficient guidance on how to improve student outcomes. (NASEM, 2022, p. 27, emphasis added)
As indicated in the National Academies’ (2022) call for a focus on adaptation of interventions, a second approach to scaling an innovation not only tolerates but also expects that there will be principled adaptations to the innovation as it is scaled, that is, as it is used and modified by new enactors, in new contexts. Scholars have described this approach to implementation and scale as “mutual adaptation,” whereby enactors modify their current practices in order to implement the innovation, while also making adaptations to the innovation to make it “useable in their context”
(Russell et al., 2020, p. 156). Typically, designers articulate the desired learning goals associated with the implementation of the innovation as well as some core principles, both of which set expectations for the enactment of the innovation. But enactors are expected, and even encouraged, to adapt the innovation in response to their local contexts and needs (Means & Penuel, 2005; Penuel et al., 2011; Roschelle, Mazziotti, & Means, 2021). From this perspective, “[l]ocal actors know their context and can use this knowledge to effectively adapt innovations. Local conditions cannot be ‘designed away,’ but are key to successful outcomes” (Morel et al., 2019, p. 371).
From a mutual adaptation perspective, researchers tend to focus on what LeMahieu (2011) termed “integrity of implementation,” rather than fidelity of implementation.6 LeMahieu described the difference as follows: Fidelity of implementation suggests that enactors should “do exactly what [the designers] say to do,” whereas integrity of implementation suggest that enactors should “do what matters most and works best while accommodating local needs and circumstances.”7 From an integrity of implementation perspective, as an innovation is scaled, data are generated and analyzed to examine the ways in which core principles associated with the innovation are maintained or perhaps evolved, whether the intended learning goals are met, and why or why not. Determinations need to be made regarding what adaptations preserve the integrity of the innovation, and which do not, and why.
Within adaptation approaches, there are differences in who is involved in the design process, and how much they are involved (e.g., how much enactors are part of the design itself). In some cases, like the Tennessee Math Coaching Project discussed in Box 4-2, the initial designers were primarily researchers and professional development providers; they generated the coaching model, and iterated upon it based on prior research on the impact of earlier iterations of the model on the quality of coaching and teaching in a limited set of districts (Stein et al., 2022), prior to scaling it across numerous districts in Tennessee (Russell et al., 2020). In other cases, as elaborated in the co-design section below, educators (the enactors), and beneficiaries (e.g., students, families) are more integrally involved in the design of the innovation from the start (Morel et al., 2019).
Innovations that invite principled, mutual adaptation, such as the example discussed in Box 4-2, can support deep and sustained implementation,
___________________
6 LeMahieu (2011) distinguishes “fidelity of implementation” from “integrity of implementation.” As indicated in O’Donnell (2008), however, researchers investigating fidelity of implementation sometimes treat the term “integrity” as synonymous with “fidelity.”
7 See https://www.carnegiefoundation.org/blog/what-we-need-in-education-is-more-integrity-and-less-fidelity-of-implementation/
There is increasing evidence that content-focused coaching in mathematics can support instructional improvement; however, “attempts to spread and scale coaching programs have resulted in variable outcomes, in part because coaching is highly context dependent” (Russell et al., 2020, p. 150). The Tennessee Math Coaching Project deliberately set out to study the mutual adaptation of a proven coaching model as it was implemented by 32 coaches in 21 school districts across the state (Russell et al., 2020). The core principles of the innovation (derived from earlier empirical studies of coaching that resulted in improvements to teaching and student learning) included specific “coaching practices” and a “coaching routine” in which coaches were expected to press and support teachers to identify specific mathematical goals, and to engage in deep and specific discussion of teaching in relation to artifacts of student learning. As Russell and colleagues (2020) described, as the model was spread to schools and districts across the state, they intended for enactors (in this case, coaches) and enablers (in this case, school and district leaders) to adapt the model in relation to “diverse organizational contexts” (p. 157). For example, the designers and researchers anticipated adaptations in relation to school- and district-level variation regarding the “selection of training of coaches,” how coaches work is organized in schools (e.g., how much time they work with teachers, whether they are asked to perform additional duties), coaches’ relationship with school principals (e.g., whether principals “endorse [. . .] coaches as sources of expertise”), and how school and district leaders “[frame] the purpose and goals of coaching” (p. 157). The designers and researchers also anticipated adaptations with respect to what they termed “diverse relational contexts,” or how coaches adapted “their work with teachers based on their impression of the teacher’s openness to coaching” (p. 157).
Adopting methods associated with continuous improvement and improvement science (e.g., Bryk et al., 2015), the researchers organized iterative cycles of inquiry in which they collected and analyzed quantitative and qualitative data to assess the coaches’ enactment of the coaching practices and routines, in relation to teaching and learning outcomes, and to identify how the coaches adapted the model, why, and to what effect (Russell et al., 2020, p. 176). In doing so, they identified “which adaptations preserve (or violate) the integrity of the coaching model” (p. 176; emphasis added). Further, they used this knowledge to inform the professional learning of the coaches, so that the coaches deepened their understanding of the critical elements of the coaching model and were better equipped to anticipate challenges that would compromise the effectiveness of the model. On the basis of the cycles of inquiry, the team was in a position to offer interested states a comprehensive and empirically tested model for supporting the implementation of high-quality mathematics coaching at the scale of a state. The model includes “evidence-based coaching practices” that lead to the improvement of teaching; “an approach to train coaches to enact the coaching framework”; and “guidance for schools, districts, and states on how to organize and support a coaching program” (p. 150).
given that enactors and enablers adapt the given innovation in response to the needs, resources, and challenges of the specific context in which it is being implemented. Moreover, providing agency to local enactors and enablers to make principled adaptations to the innovation can enable a shift in ownership of the innovation.
Yet, there are tensions and challenges in implementing and scaling innovations characterized by principled and mutual adaptation. One tension concerns specifying and communicating the underlying principles of the innovation such that others can make sense of them deeply enough, to engage in adaptation without losing sight of the underlying assumptions and logic of the innovation (Brown & Campione, 1996). Kirschner and Polman (2013) addressed this tension directly in their respective design of innovations—methods of supporting “critical civic inquiry (CCI)” among youth and teachers, tied to disciplinary content, including science (Kirschner et al.)—and the teaching of science journalism to enable high school students’ scientific literacy engagement in science. In both cases, the innovations were intended to be adapted in principled ways in new contexts and sustained. To facilitate principled adaptation, the teams developed what they term “signature tools,” which “instantiate certain core goals of the intervention while also acting as ‘thinking devices’ that actors can adopt, adapt, modify, and borrow in locally appropriate ways” (p. 232). In the case of CCI, the team “formulated parameters for CCI projects that would provide consistency while also being flexible enough to accommodate local adaptation” (p. 228). CCI projects are expected to “focus on a problem experienced by students at the school, selected by students, and examined through the lens of educational equity” (p. 228; emphasis in original). The team therefore developed two signature tools to enable the CCI projects to maintain a focus on educational equity and school-based action, that is, CCI’s core principles, knowing that the focus of CCI projects would necessarily vary in local contexts. In the case of SciJourn, the team developed “science literacy standards” and “standards for article writing,” which teachers used to anchor various science journalism activities that fit with their local contexts. In both cases, consistent with Clarke and Dede’s (2009) emphasis on evolution of the innovation, the teams deliberately set out to learn from the adaptations that the enactors made, in service of improving their innovations.
A second, related challenge to adaptation concerns the “looseness” regarding what counts as core principles to adhere to in adaptation. As an innovation that has “proven” effective in one context is adapted in another, there is always a risk that it might be altered in such a way that it no longer results in the desired outcomes. Without careful study of adaptations, as well as the impact of the innovation on the intended outcomes for beneficiaries, it is difficult to identify what makes for “lethal mutations” (Haertel, personal communication, 1984, as cited in Brown &
Campione, 1996, p. 292), therefore making it difficult to inform others’ implementation efforts in new contexts. The committee returns to this issue in Chapters 7 and 8.
Over the past two decades, there has been increasing attention, as well as federal and philanthropic support, for researchers, educators, and sometimes youth and/or families, to co-design, implement, and evaluate innovations in local contexts, in response to “problems, needs, and opportunities” (Russell & Penuel, 2022, p.1). To be clear, not all cases in which researchers and educators collaboratively design innovations attend to issues of implementation, scale, and sustainability. In this section, the committee focuses on co-design approaches that squarely focus on implementation, scale, and sustainability. In particular, the committee features design-based implementation research (DBIR) and networked improvement communities (NICs), given the evidence base.8 Co-design approaches focused on scale tend to reflect commitments to principled adaptation, as described above. A hallmark of DBIR and NICs is engagement in cycles of “continuous improvement;” that is, collaborators articulate theories of improvement, and engage in cycles of inquiry, in which “design and evaluation activities are tightly coordinated with one another, so that evidence from tests of innovations plays an integral role in informing design” and the evolving theory (Russell & Penuel, 2022, p. 5).
In addition, collaborators engaged in DBIR and NICs tend to take a “systemic perspective” (p. 5), meaning partners intentionally work to understand the broader system that has led to the problems they are working to address, and they understand that successfully addressing the problems will likely require concomitant changes in aspects of the system. In some cases, the collaboration results in a product that can be implemented and adapted in additional contexts. In other cases, however, the goal is not necessarily to produce a packaged product or model that can be implemented in other contexts, but rather to generate knowledge about ways to address a persistent problem of practice at scale, which can inform others’ efforts to locally address a similar problem (Roschelle, Mazziotti, & Means, 2021).
DBIR is a kind of design research that involves the intentional design, implementation, and adaption of innovations as they go to scale (Fishman & Penuel, 2018; Penuel & Fishman, 2012). Different from other forms of
___________________
8 It should be noted that this discussion was not meant to be exhaustive.
design research, DBIR focuses, from the start, on concerns of implementation and sustainability, including whether the innovation can be “adapted successfully to meet the needs of diverse learners across diverse settings, in both formal and informal education” (Fishman & Penuel, 2018, p. 393). Unlike conventional research focused on implementation, in DBIR, researchers do not observe implementation; instead, they actively seek to design for robust implementation, collect and analyze data on the implementation of the innovation, and use the analysis to improve and generate knowledge about what makes for productive implementation of the innovation. The projects often seek to address questions such as “What works when, how, and for whom?” “How do we improve this reform strategy to make it more sustainable?” and “What capacities does the system need to continue to improve?” (Penuel et al., 2011, p. 335). The dynamic, mutually engaged relationship among designers and enactors also provides new opportunities for recognizing and addressing issues of equity and responsiveness to local cultural contexts. Four key principles of DBIR include:
DBIR “challenges educational researchers and practitioners to transcend traditional research/practice barriers to facilitate the design of educational interventions that are effective, sustainable, and scalable” (p. 136). Underlying this call is an emphasis on research-practice partnerships, which reconfigures the roles of researchers and practitioners in design and implementation of innovations that can be scaled. This approach requires two-way and recursive relationships between research and practice (see Coburn & Stein, 2010), in contrast to “research and development as a linear process that leads from design by researchers to scale up by practitioners” (Penuel et al., 2011, p. 138). Collaboratively designed innovations are more likely to be usable since they are rooted in problems of practice and the needs of practitioners (Fishman et al., 2013). DBIR projects also typically include practitioners who work at multiple levels of a system (classroom, school, district), as well as community members, in order to ensure that community- and system-level concerns are attended
to in the design of the innovation (Fishman et al., 2013). To understand these dynamics at a participatory and community-based level, designs can also center “processes of partnering as a primary object of analysis” when considering collaborations and power dynamics—especially when designing across multiple stakeholders including families, elders, teachers, and researchers (Bang & Vossoughi, 2016, p. 175). Box 4-3 provides an example of this approach.
Associated with continuous improvement research,9 NICs offer a structure for practitioners, researchers, and designers to investigate and design strategies to pressing problems of practice at scale in diverse, local contexts (Roschelle, Mazziotti, & Means, 2021; Russell et al., 2017). The four essential characteristics of NICs are as follows:
The use of the term “networked” to describe these communities is deliberate. Most pressing educational problems span multiple levels of a system (e.g., classroom, school, district) and often sectors (e.g., business community and education, health and education). As such, addressing a persistent problem at some scale likely entails engaging multiple role groups, and requires diverse forms of knowledge and expertise (Russell et al., 2017). Networks have the benefit of being able to engage diverse participants, and to spread knowledge across people, systems, and geographies that are often siloed. Moreover, it becomes possible to simultaneously test innovations in the multiple organizations participating in NICs, thereby accelerating learning about the particular innovation, in relation to context-specific resources and constraints. NICs in education have grown in number over the past decade or so, spurred by training and
___________________
9 The IES Regional Educational Laboratory Program provides a toolkit for others looking to engage in continuous improvement: https://ies.ed.gov/ncee/rel/Products/Region/northeast/Publication/4005
University of Colorado, Boulder researchers, as well as the nonprofit research organization BSCS Science Learning, engaged in a long-term partnership (Inquiry Hub) with educators in Denver Public Schools to design, test, and improve Biology curriculum materials. Other partners included Northwestern University’s Next Gen Storylines team, and teachers in other states. The materials were “intended to help students gain a grasp of how to use science and engineering practices to explain phenomena that students find interesting and solve problems students perceive as relevant to themselves and their communities” (Penuel, 2019, p. 661). Importantly, the vision of science teaching and learning undergirding the materials represented a substantial shift from “business as usual.”
It is not surprising then, that, as described by Penuel (2019), early in the classroom testing process, researchers found that the district’s observation, evaluation, and feedback system was at odds with the instructional improvement efforts. Rather than treat the evaluation system as “outside the scope” of their work, from a design-based implementation research perspective, figuring out how to support teachers to navigate the system became a critical aspect of the partnership work. As Penuel (2019) described: “Our district partners insisted that we engage with the [evaluation] system and its processes rather than attempt to interfere with it. If we did create such a buffer, [. . .] we would only be creating a potential problem for sustaining implementation over the long haul” (p. 666). In response, the partnership created guides for observers regarding what they should expect to see as teachers were implementing the new Biology curriculum materials. They found, however, that guides were not enough to mitigate the challenges teachers faced as they were observed; so, as a next step, the partnership created a routine by which the district science specialist was able to conduct formal observations that reflected the intent of the new materials and their underlying vision of teaching and learning.
In addition to challenges with observation and feedback routines, researchers learned, based on classroom observations and student surveys, that teachers needed support to implement the materials in ways that advanced equity. The partnership designed a set of strategies (e.g., professional learning opportunities, routines for regularly eliciting and analyzing student feedback, generation of culturally relevant phenomena for students to investigate) to strengthen educators’ commitments and practice in service of creating more inviting, inclusive science classrooms. In addition, the district “developed a cadre of leaders inside and outside the classroom who can recognize and support teachers in creating more inclusive cultures,” which has enabled sustainability of the ongoing work (p. 670). As illustrated by the Inquiry Hub example, the attention to implementation, scale, and sustainability as part of the ongoing design work contrasts with “design research that develops and tests an innovation with no plan for how to support its implementation after the research ends” (p. 670). In this approach to scale, researchers and educators co-design an innovation that can fit better into an existing system—while also keeping an eye on the need to redesign aspects of the system (e.g., system-wide observation and feedback) to support underlying principles of the innovation.
funding from private foundations as well as federal agencies, including NSF and IES (Feygin et al., 2020; see Box 4-4).
Co-design efforts that explicitly focus on implementation and scale, such as the examples in Boxes 4-3 and 4-4, are poised to enable deep and sustained implementation in local contexts. This is because enactors (and sometimes beneficiaries) who have critical expertise regarding their local contexts and communities both anticipate and support teams to alter the design and implementation in response to particular resources, opportunities, and challenges. While an initial sense of ownership is built into co-design efforts, when the innovation spreads beyond those involved in the co-design, new enactors may not feel ownership of the innovation, or that the innovation is solving a problem they face. It is therefore necessary to continue to broaden and deepen ownership among enactors and enablers not initially included in the co-design efforts (Bryk et al., 2015). More generally, while there tends to be research on the impact of co-designed innovations on the intended outcomes, there is limited research on what happens when a co-designed product is implemented at scale in a context outside the one in which it was co-designed (Coburn & Penuel, 2016).
There are examples of co-design efforts that deliberately operate at a large scale (i.e., multiple districts, regions, etc.) from the start of the project, with explicit attention to enabling the spread of tools and knowledge focused on deep shifts to the instructional core (e.g., Cobb et al., 2018; Donovan, Wigdor, & Snow, 2003; Edelson et al., 2021). The development of Open SciEd instructional materials is a prime example (see also Seeding Innovations in Appendix C). As described in Edelson et al. (2021), Open SciEd was “launched in 2017 [. . .] as a collaboration of material developers, educational researchers, classroom educators, and educational leaders” (p. 780). Initially, it designed middle school science curriculum materials, paired with professional development materials, aimed at supporting deep shifts in science teaching and learning. Open SciEd has since gone on to create and field test curriculum in elementary and high school science as well, following similar co-design processes as they did with the middle school materials.10
Another affordance of co-design efforts is that they deliberately build in attention to equity and power relations as part of design and improvement processes (e.g., Peurach et al., 2022). Researchers and practitioners negotiate the problems of practice around which innovations are designed; and efforts are made to ensure that the people most affected by innovation are “at the table” as it is being designed, trialed, and improved. Recently, there have been some significant new efforts to co-create culturally equitable and responsive curriculum resources and programs through collaborative
___________________
10 See https://digitalpromise.org/initiative/openscied-research-community/resources/
Between 2012–2017, the University of Washington Ambitious Science Teaching research team partnered with a team of educators in a moderately-sized district to support secondary science teachers’ development of ambitious and equitable teaching practices, with a specific focus on supporting emergent bilingual students, as the district implemented the Next Generation Science Standards (Thompson et al., 2019). The initial research-practice partnership (RPP) included two schools; however, after the first year of the partnership, district leaders asked that the work be expanded to all schools. The RPP thus initiated a NIC (Bryk, Gomez, & Grunow, 2011), made up of each of the eight high school science professional learning communities (PLCs). All the district’s secondary science teachers participated in the NIC, alongside science research-practice and emergent bilingual instructional coaches. This “all-comers” approach differs from most NICs, in which participants “opt in” and thus often includes some but not all members of a particular role group in a setting (Thompson et al., 2019).
With financial support from the National Science Foundation and the district, the NIC established (a) “Studios,” which were full-day job-embedded opportunities for PLCs to try out specific teaching practices in classrooms, analyze the impact of teaching on student learning, and set goals for future teaching; (b) “district science coaches who provided one-on-one coaching and facilitated Studios”; (c) “data meetings (in between Studios) where [PLCs] iterated on practices in relation to classroom data” and where teams would ask, “Which practices work? For whom? And under what conditions?”; (d) network-wide
networks that include families and community members as active partners. For example, the Learning in Places Collaborative,11 with funding support from NSF, has been developing and studying new models for Pre-K–12 STEM learning that engage children through investigations in local outdoor settings and include indigenous ways of knowing that position people as a part of the natural world, rather than apart from it (Lees & Bang, 2022). Learning in Places also mobilizes knowledge, perspectives, and experiences shared by family and intergenerational community members and frames learning in the context of ethical decisions that affect the community in the present and the future. There is also ongoing work in early education spaces addressing similar issues (e.g., Dominguez et al., 2023; Kamdar et al., 2024; Lewis Presser et al., 2017; McWayne et al., 2021, 2022; Presser et al., 2019).
One challenge specific to co-design scaling approaches is that the current structure of the education system and expectations—especially for
___________________
meetings where PLCs would share common challenges and what they were learning across sites; and (e) “instructional walks” where principals would observe instruction with coaches (Thompson et al., 2019, p. 4). In year four, the RPP established a teacher leader cadre to “drive school-based improvement work and support sustainability” (p. 4). Grant funding ended in year four, and the district “assumed responsibility for funding Studios and the science coaching positions” (p. 4).
Researchers conducted a retrospective analysis of how each of the PLCs launched its instructional improvement work, and how knowledge and tools varied and spread across the network, based on observational data of Studios and classroom instruction, as well as teachers’ reflections and annual network surveys (Thompson et al., 2019). The researchers identified three patterns: (a) Local Practice Development, (b) Spread and Local Adaptation, and (c) Integrating New Practices. “PLCs with a common aim began with drafting and testing practices and tools (Local Practice Development)” (p. 18). In PLCs that struggled to initially develop a common aim and vision for improving science instruction, coaches and researchers intentionally shared practices and tools from other PLCs. As PLC teams tested these practices and tools from other PLCs, they began to identify localized problems of practice to focus on (Spread and Local Adaptation). Lastly, “[s]ome PLCs kept their baseline practice as a focus of inquiry, yet through examining data from students and developing theories of how students learned, they became dissatisfied with aspects of the practice and developed or adopted other network practices to address identified limitations (Integrating New Practices)” (p. 19). By identifying these launch patterns, researchers made visible the “messy middle” in RPPs (Penuel, 2019) where partners negotiate what and how to learn together and build structures to support their joint work.
teachers’ work—is at odds with the time and space it takes to engage in ongoing cycles of design, implementation, and improvement of new practices, materials, and so forth (Cohen & Mehta, 2017). Teachers who participate in collaborative design and research projects are often self-selected and often engage in this work on top of their teaching. That said, there are examples, like the Ambitious Science Teaching NIC (Thompson et al., 2019) discussed in Box 4-4, in which researchers and system leaders collaborate to revise systems of professional learning and evaluation to enable broad participation.
Additionally, engaging in collaborative design, implementation, and scaling efforts requires new forms of expertise and of collaboration than is typical of STEM innovation design and research (Cohen-Vogel, Harrison, & Cohen-Vogel, 2022). It entails, for example, knowledge of systems, policy, leadership, and implementation alongside knowledge of STEM teaching and learning. Moreover, engaging in such efforts requires that “researchers develop new ways of working with practitioners that prioritize the development of trust, take schools’ and districts’ current improvement goals and
strategies as a primary point of reference, and are sensitive to schools’ and districts’ capacities and constraints” (Cobb et al., 2013, p. 344). Similar issues with respect to building trust; creating time and space for full collaboration; and navigating alignment with existing goals, priorities, and routines arise for co-design projects that invite the participation of non-system actors, such as families and community members.
Throughout this chapter, the committee demonstrated the importance of conceptualizing scale as multidimensional, especially when the innovation is aimed at engendering fundamental changes to STEM teaching and learning. We introduced Coburn’s framework for thinking about scale, which includes four dimensions: spread, depth, sustainability, and shifts in ownership. Most research on and assessments of scale focus on spread, or increasing the number of enactors, beneficiaries, and/or contexts in which an innovation is enacted. The committee cautions against a blunt focus on spread for a couple of reasons. One reason is that some innovations are specifically tailored for particular populations and/or places. Another reason is a focus on spread alone does not support educators or researchers to know whether an innovation is resulting in the desired improvements and for whom, whether the innovation is sustained as enactors change, and why or why not. In addition to spread, it is critical to attend to how the innovation is implemented, or the extent to which the enactors adopt the innovation superficially or at some depth. Enacting an innovation with some depth typically entails substantial changes to enactors’ current beliefs about teaching and learning, and about students, as well as enactors’ content knowledge and practice. Moreover, it is critical to attend to the sustainability and ownership of the innovation, that is, whether the innovation endures over time in the original and new contexts when the initial circumstances, such as funding, change; and the conditions in which knowledge, authority, and ownership of the innovation are both broadened and deepened over time.
Assessments of scale often focus on surface-level indications of adoption (e.g., number of users, presence of materials, time spent using the materials), absent attention to depth, sustainability, and ownership. An important area for research on the scale of STEM innovations includes the development of measures, or ways of assessing, these additional dimensions of scale, especially when the innovations target substantial changes to the instructional core. Further, equity concerns are central in assessments of scale. For example, for whom is the innovation “working” and “not working,” and why? Is there evidence that teachers view their students, particularly those who have not been served well in the current system, as capable and deserving of robust STEM instruction? Moreover, most studies
of scaling up innovations focus on the first few years of implementation, without attention to what happens after additional resources and supports have officially ended. Incentives to both plan for and study issues of depth, sustainability, and ownership after initial implementation of an innovation are necessary, coupled with explicit attention to equity concerns.
Federal funding agencies have historically prioritized sequential studies of scaling innovations (pilot studies, efficacy studies, effectiveness studies, scale-up studies), whereby the innovation is implemented in tightly prescribed ways, in increasingly heterogenous sites and/or populations, and in service of a specified set of outcomes. “Fidelity of implementation” is prioritized. When paired with an explicit theory of change, fidelity of implementation can support researchers to identify faults in the innovation as designed (i.e., in the theory of change and its assumptions), and how variation in implementation impacts outcomes, for better and worse. Especially in cases in which the innovation targets changes to the instructional core, however, it is often difficult to tease apart issues with the innovation itself and aspects of the implementation context that negatively (or positively) impact implementation of the innovation. Moreover, a focus on fidelity of implementation often results in minimal attention to how educators might modify an innovation to make it work better in their context.
In response, more recent research on implementing STEM innovations at scale points to the value of approaches that invite principled, mutual adaptation. It is incumbent upon the designers to identify and communicate the core principles underlying the innovation. Then, as the innovation is scaled, researchers (and sometimes partner educators) generate and analyze data to examine the ways in which core principles associated with the innovation are maintained (what is sometimes referred to as “integrity of implementation”) or evolve, whether the intended learning goals are met, and why or why not. Researchers deliberately study and learn from the adaptations enactors make as they implement the innovation, to both revise the innovation and so that they can inform enactors in other contexts regarding what features might support and serve as a barrier to implementation, and strategies for mitigating challenges.
Consistent with the principles of mutual adaptation approaches to scaling, there is growing evidence of the value of researchers and educators in communities collaboratively designing, implementing, and scaling innovations. In collaborative approaches (e.g., design-based implementation research, networked improvement communities) that explicitly take a systemic perspective, collaborators work to understand how various aspects of their local educational system (e.g., school leaders’ expectations, hiring, time for teachers to collaborate) impact the challenge they are addressing, and they design and revise strategies to address those systemic challenges. An affordance of this approach is that concerns for implementation are part and parcel of the design of the innovation.
On the whole, there is growing evidence that approaches to scaling that invite principled, mutual adaptation, and those that are collaboratively designed, implemented, and scaled by researchers and educators, can enable deep and sustained implementation of STEM innovations. More documentation of the adaptations that enactors and enablers make and why, and the impact on teaching and learning outcomes, is needed to support implementation of promising innovations in new contexts. Attention to and documentation of how sustainability and shift in ownership are enabled—and if not, why not—is necessary in studies of scaling. Attention to developing the capacity of STEM researchers who have the requisite knowledge and mentored experience in engaging in research on STEM innovation, implementation, and scale is also needed. Further, time and resources are needed to support teachers (and leaders) to partner with researchers to design, implement, and study innovations.
Bang, M., & Vossoughi, S. (2016). Participatory design research and educational justice: Studying learning and relations within social change making. Cognition and Instruction, 34(3), 173–193.
Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings. Journal of the Learning Sciences, 2(2), 141–178.
Brown, A. L., & Campione, J. C. (1996). Psychological theory and the design of innovative learning environments: on procedures, principles, and systems. In L. Schauble & R. Glaser (Eds.), Innovations in learning: New environments for education (pp. 289–325). Lawrence Erlbaum Associates.
Bryk, A. S., Gomez, L. M., & Grunow, A. (2011). Getting ideas into action: Building networked improvement communities in education. In M. Hallinan (Ed.), Frontiers in sociology of education (pp. 127–162). Springer.
Bryk, A. S., Gomez, L. M., Grunow, A., & LeMahieu, P. G. (2015). Learning to improve: How America’s schools can get better at getting better. Harvard Education Press.
Bryk, A. S., Sebring, P. B., Allensworth, E., Luppesco, S., & Easton, J. Q. (2010). Organizing schools for improvement: Lessons from Chicago. University of Chicago Press.
Clarke, J., & Dede, C. (2009). Design for scalabilty: A case study of the River City curriculum Journal of Science Education and Technology, 18(4), 353–365. https://www.jstor.org/stable/20627713
Cobb, P., & Jackson, K. (2015). Supporting teachers’ use of research-based instructional sequences. ZDM, 47, 1027–1038.
Cobb, P., Jackson, K., Henrick, E., Smith, T., & MIST team. (2018). Systems for Instructional Improvement: Creating coherence from the classroom to the district office. Harvard Education Press.
Cobb, P., Jackson, K., Smith, T., Sorum, M., & Henrick, E. (2013). Design research with educational systems: Investigating and supporting improvements in the quality of mathematics teaching and learning at scale. In B. J. Fishman, W. R. Penuel, A.-R. Allen, & B. H. Cheng (Eds.), Design based implementation research: Theories, methods, and exemplars. National Society for the Study of Education Yearbook (Vol. 112, Issue 2, pp. 320–349). Teachers College.
Coburn, C. E. (2003). Rethinking scale: Moving beyond numbers to deep and lasting change. Educational Researcher, 32(6), 3–12.
Coburn, C. E., & Penuel, W. R. (2016). Research-practice partnerships in education: Outcomes, dynamics, and open questions. Educational Researcher, 45(1), 48–54. https://doi.org/10.3102/0013189X16631750
Coburn, C. E., Russell, J. L., Kaufman, J. H., & Stein, M. K. (2012). Supporting sustainability: Teachers’ advice networks and ambitious instructional reform. American Journal of Education, 119(1), 137–182.
Coburn, C. E., & Stein, M. K. (Eds.). (2010). Research and practice in education: Building alliances, bridging the divide. Rowman & Littlefield.
Cohen, D. K., & Mehta, J. D. (2017). Why reform sometimes succeeds: Understanding the conditions that produce reforms that last. American Educational Research Journal, 54(4), 644–690.
Cohen-Vogel, L., Harrison, C., & Cohen-Vogel, D. (2022). On teams: Exploring variation in the social organization of improvement research in education. In D. J. Peurach, J. L. Russell, L. Cohen-Vogel, & W. R. Penuel (Eds.), The foundational handbook on improvement research in education (pp. 325–346). Rowman & Littlefield.
Dane, A. V., & Schneider, B. H. (1998). Program integrity in primary and early secondary prevention: Are implementation effects out of control? Clinical Psychology Review, 18(1), 23–45. https://doi.org/10.1016/S0272-7358(97)00043-3
Datnow, A., Hubbard, L., & Mehan, H. (2002). Extending educational reform. Taylor & Francis.
Dede, C. (2006). Evolving innovations beyond ideal settings to challenging contexts of practice. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (pp. 551–566). Cambridge University Press.
Dominguez, X., Vidiksis, R., Leones, T., Kamdar, D., Presser, A. L., Bueno, M., & Orr, J. (2023). Integrating science, mathematics, and engineering: Linking home and school learning for young learners. Digital Promise.
Donovan, M. S., Wigdor, A. K., & Snow, C. E. (Eds.) (2003). Strategic Education Research Partnership. National Research Council.
Edelson, D. C., Reiser, B. J., McNeill, K. L., Mohan, A., Novak, M., Mohan, L., Affolter, R., McGill, T. A. W., Bracey, Z. B., Noll, J. D., Kowalski, S. M., Novak, M., Lo, A. S., Landel, C., Krumm, A., Penuel, W. R., Van Horne, K., González-Howard, M., & Suárez, E. (2021). Developing research-based instructional materials to support large-scale transformation of science teaching and learning: The approach of the OpenSciEd middle school program. Journal of Science Teacher Education, 32(7), 780–804. https://doi.org/10.1080/1046560X.2021.1877457
Elmore, R. F. (1996). Getting to scale with good educational practice. Harvard Educational Review, 66(1), 1–26.
Farley-Ripple, E., May, H., Karpyn, A., Tilley, K., & McDonough, K. (2018). Rethinking connections between research and practice in education: A conceptual framework. Educational Researcher, 47(4), 235–245.
Feygin, A., Nolan, L., Hickling, A., & Friedman, L. (2020). Evidence for networked improvement communities: A systematic review of the literature. American Institutes for Research.
Finnigan, K. S., & Daly, A. J. (2016). Why we need to think systemically in educational policy and reform. Thinking and acting systemically: Improving school districts under pressure, 1–9.
Fishman, B. J., & Penuel, W. R. (2018). Design-based implementation research. In F. Fischer, C. E. Hmelo-Silver, S. R. Goldman, & P. Reimann (Eds.), International handbook of the learning sciences (pp. 393–400). Routledge.
Fishman, B. J., Penuel, W. R., Allen, A.-R., Cheng, B. H., & Sabelli, N. (2013). Design-based implementation research: An emerging model for transforming the relationship of research and practice. In B. J. Fishman, W. R. Penuel, A.-R. Allen, & B. H. Cheng (Eds.), Design-based implementation research: Theories, methods, and exemplars (Vol. 112, Issue 2, pp. 136–156). National Society for the Study of Education.
Fishman, B. J., Penuel, W. R., Hegedus, S., & Roschelle, J. (2011). What happens when the research ends? Factors related to the sustainability of a technology-infused mathematics curriculum. Journal of Computers in Mathematics and Science Teaching, 30(4), 329–353.
Gage, N., MacSuga-Gage, A., & Detrich, R. (2020). Fidelity of implementation in educational research and practice. The Wing Institute. https://www.winginstitute.org/systems-program-fidelity
Gersten, R., Chard, D., & Baker, S. (2000). Factors enhancing sustained use of research-based instructional practices. Journal of Learning Disabilities, 33(5), 445–456.
Hargreaves, A., & Goodson, I. (2006). Educational change over time? The sustainability and nonsustainability of three decades of secondary school change and continuity. Educational Administration Quarterly, 42(1), 3–41.
Kamdar, D., Leones, T., Vidiksis, R., & Dominguez, X. (2024). Shining light on preschool science investigations: Exploring shadows and strengthening visual spatial skills. Science and Children, 61(3), 20–24.
Kirschner, B., & Polman, J. L. (2013). Adaptation by design: A context-sensitive, dialogic approach to interventions. In B. J. Fishman, W. R. Penuel, A.-R. Allen, & B. H. Cheng (Eds.), Design based implementation research: Theories, methods, and exemplars. National Society for the Study of Education (Vol. 112, Issue 2, pp. 215–236). Teachers College.
Klingner, J. K., Vaughn, S., Tejero Hughes, M., & Arguelles, M. E. (1999). Sustaining research-based practices in reading: A 3-year follow-up. Remedial and Special Education, 20(5), 263–287.
Lees, A., & Bang, M. (2022). We’re not migrating yet: Engaging children’s geographies and learning with lands and waters. Occasional Paper Series, (48), 33–47. https://doi.org/10.58295/2375-3668.1454
LeMahieu, P. G. (2011). What we need in education is more integrity (and less fidelity) of implementation. Carnegie Foundation for the Advancement of Teaching. https://www.carnegiefoundation.org/blog/what-we-need-in-education-is-more-integrity-and-less-fidelity-of-implementation/
Lewis Presser, A. E., Kamdar, D., Vidiksis, R., Goldstein, M., & Dominguez, X. (2017). Growing plants and minds: Using digital tools to support preschool science learning. Science & Children, 55, 41–47.
McLaughlin, M.W., & Mitra, D. (2001). Theory-based change and change-based theory: Going deeper, going broader. Journal of Educational Change, 2(4), 301–323.
McWayne, C. M., Greenfield, D., Zan, B., Mistry, J., & Ochoa, W. (2021). A comprehensive professional development approach for supporting science, technology, and engineering curriculum in preschool: Connecting contexts for dual language learners. In S. T. Vorkapić & J. LoCasale-Crouch (Eds.), Supporting children’s well-being during the early childhood transition to school (pp. 222–253). IGI Global.
McWayne, C. M., Zan, B., Ochoa, W., Greenfield, D., & Mistry, J. (2022). Head Start teachers act their way into new ways of thinking: Science and engineering practices in preschool classrooms. Science Education. https://doi.org/10.1002/sce.21714
Means, B., & Penuel, W. R. (2005). Scaling up technology-based educational innovations. In C. Dede, J. P. Honan, & L. C. Peters (Eds.), Scaling up success: Lessons from technology-based educational improvement (pp. 176–197). Jossey-Bass.
Morel, R. P., Coburn, C. E., Catterson, A. K., & Higgs, J. (2019). The multiple meanings of scale: Implications for researchers and practitioners. Educational Researcher, 48(6), 369–377. https://doi.org/10.3102/0013189X19860531
National Academies of Sciences, Engineering, and Medicine. (2022). The future of education research at IES: Advancing an equity-oriented science. The National Academies Press.
O’Donnell, C. L. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes in K–12 curriculum intervention research. Review of Educational Research, 78(1), 33–84. https://doi.org/10.3102/0034654307313793
Penuel, W. R. (2019). Infrastructuring as a practice of design-based research for supporting and studying equitable implementation and sustainability of innovations. Journal of the Learning Sciences, 28(4–5), 659–677. https://doi.org/10.1080/10508406.2018.1552151
Penuel, W. R., & Fishman, B. J. (2012). Large-scale science education intervention research we can use. Journal of Research in Science Teaching, 49(3), 281–304.
Penuel, W. R., Fishman, B. J., Cheng, B. H., & Sabelli, N. (2011). Organizing research and development at the intersection of learning, implementation, and design. Educational Researcher, 40(7), 331–337.
Peurach, D. J., Russell, J., Cohen-Vogel, L., & Penuel, W. R. (Eds.). (2022). The foundational handbook on improvement research in education. Rowman & Littlefield.
Presser, A. L., Dominguez, X., Goldstein, M., Vidiksis, R., & Kamdar, D. (2019). Ramp It UP! Science and Children, 56(7), 30–37.
Reiser, B. J., Michaels, S., Moon, J., Bell, T., Dyer, E., Edwards, K. D., McGill, T. A. W., Novak, M., & Park, A. (2017). Scaling up three-dimensional science learning through teacher-led study groups across a state. Journal of Teacher Education, 68(3), 280–298. https://doi.org/10.1177/0022487117699598
Rogers, R., Kramer, M. A., Mosley, M., & Literacy for Social Justice Teacher Research Group. (2009). Designing socially just learning communities: Critical literacy education across the lifespan. Routledge.
Roschelle, J., Mazziotti, C., & Means, B. (2021). Scaling up design of inquiry environments. In C. Chinn & R. G. Duncan (Eds.), International handbook on learning and inquiry. Routledge.
Russell, J. L., & Penuel, W. R. (2022). Introducing improvement research in education. In D. J. Peurach, Russell, J. L., Cohen-Vogel, L. & Penuel, W. R. (Eds.), The foundational handbook on improvement research in education (pp. 1–20). Rowman & Littlefield.
Russell, J., Bryk, A. S., Dolle, J. R., Gomez, L. M., LeMahieu, P., & Grunow, A. (2017). A framework for the initiation of networked improvement communities. Teachers College Record, 119, 1–36.
Russell, J. L., Correnti, R., Stein, M. K., Bill, V., Hannan, M., Schwartz, N., Booker, L. N., Pratt, N. R., & Matthis, C. (2020). Learning from adaptation to support instructional improvement at scale: Understanding coach adaptation in the TN Mathematics Coaching Project. American Educational Research Journal, 57(1), 148–187. https://doi.org/10.3102/0002831219854050
Scheirer, M. A. (2005). Is sustainability possible? A review and commentary on empirical studies of program sustainability. American Journal of Evaluation, 26(3), 320–347.
Stein, M. K., Russell, J. L., Bill, V., Correnti, R., & Speranzo, L. (2022). Coach learning to help teachers learn to enact conceptually rich, student-focused mathematics lessons. Journal of Mathematics Teacher Education, 25(3), 321–346. https://doi.org/10.1007/s10857-021-09492-6
Tabak, R. G., Khoong, E. C., Chambers, D. A., & Brownson, R. C. (2012). Bridging research and practice: models for dissemination and implementation research. American Journal of Preventive Medicine, 43(3), 337–350.
Thomas, J., Cook, T. D., Klein, A., Starkey, P., & DeFlorio, L. (2018). The sequential scale-up of an evidence-based intervention: A case study. Evaluation Review, 42(3), 318–357. https://doi.org/10.1177/0193841X18786818
Thompson, J., Richards, J., Shim, S.-Y., Lohwasser, K., Von Esch, K. S., Chew, C., Sjoberg, B., & Morris, A. (2019). Launching networked PLCs: Footholds into creating and improving knowledge of ambitious and equitable teaching practices in an RPP. AERA Open, 5(3), 1–22. https://doi.org/10.1177/2332858419875718
van Dijk, W., Lane, H. B., & Gage, N. A. (2023). How do intervention studies measure the relation between implementation fidelity and students’ reading outcomes? A systematic review. Elementary School Journal, 124(1), 56–84. https://doi.org/10.1086/725672
What Works Clearinghouse (2023). Pre-K mathematics. Institute of Education Sciences, U.S. Department of Education. https://ies.ed.gov/ncee/wwc/Intervention/425
Zoblotsky, T., Bertz, C. Gallagher, B., & Alberg, M. (2017). The LASER model: A systemic and sustainable approach for achieving high standards in science education: SSEC i3 validation final report of confirmatory and exploratory analyses. The University of Memphis, Center for Research in Educational Policy. Summative report prepared for Smithsonian Science Education Center.