The final step of the roadway safety management process is evaluation. There are three basic levels of post-implementation evaluations: project, countermeasure, and program level. In general, the objective of evaluations is to determine how a particular project, group of projects, or policy has affected one or more performance measures. Agencies use evaluation results to inform future funding and policy decisions. For instance, if evaluations show that certain programs or strategies are consistently effective, then agencies may choose to continue those programs and implement similar strategies at additional locations. If an evaluation shows that a project is not meeting expectations, then an agency may address the situation accordingly (e.g., remove the countermeasure or install supplemental countermeasures) (Atlanta Regional Commission 2022).
Project-level evaluations become the foundation for countermeasure- and program-level analysis. Countermeasure evaluations provide updated values of countermeasure effectiveness (e.g., crash modification factors) and help identify situations where the countermeasure is more (or less) effective (e.g., bridges or tunnels, rural or urban areas, high- or low-volume roads). Program evaluations can include an entire portfolio of projects or specific subprograms that focus on specific emphasis areas (e.g., bridges or tunnels, trucks and buses, oversize loads) (Atlanta Regional Commission 2022).
The following subsections describe project tracking and the fundamental concepts for evaluating projects, countermeasures, and programs, including measures of effectiveness, methods, and data to track before, during, and after implementation. Refer to FHWA’s Highway Safety Improvement Program Evaluation Guide for further details on countermeasure and program evaluations and the related templates to support evaluations (Gross 2017).
Project tracking is the basis of all evaluations. For each project, agencies should document the specific countermeasure(s) implemented, specific locations treated, implementation period (begin and end dates), and final project costs. Documented project costs should include preliminary engineering, right-of-way, and construction costs as applicable based on the final cost to complete the project (Atlanta Regional Commission 2022).
Tracking individual projects supports project-level evaluations and subsequently countermeasure- and program-level evaluations (Atlanta Regional Commission 2022). As such, there is a need to link each project with specific countermeasures, programs, and subprograms. As part of project tracking, the agency should note the applicable program(s) that funded the project as well as any emphasis areas targeted by the project (e.g., on-bridge strikes, under-bridge strikes with overhead structure, under-bridge strikes with pier or support, tunnel strikes).
Project-level evaluations focus on individual projects and serve as the basis for more aggregate evaluations (countermeasure- and program-level). The following are general considerations related to performance measures, methodology, and data requirements for project evaluations.
Safety effectiveness evaluations typically focus on crash-based performance measures, focusing on the localized safety impacts at the project location. Crash-based performance measures typically include site-specific changes in crashes, injuries, and fatalities. In addition, it is useful to evaluate target and correctable crashes (e.g., BrTS crashes as opposed to total crashes), particularly if the project targets specific crash types or crash contributing factors. For instance, roadway weather information systems target weather-related crashes, so it would be useful to evaluate the change in rain-, ice-, and snow-related crashes in addition to other common safety performance measures (e.g., total and fatal plus injury crashes) (Atlanta Regional Commission 2022).
While crash-based measures are preferred for safety-focused evaluations, there is the potential to use other performance measures to assess the operational effectiveness of completed projects. Operational performance measures may include operating speed, driver compliance, and driver response. Further, it may be useful to assess the economic performance of a project through measures such as the observed benefit-cost ratio; however, economic measures are susceptible to influence from a few severe crashes, particularly when using simple before-after methods to evaluate the change in crashes. For example, just one fatal crash in the before or after period can have a substantial impact on the estimated benefit.
At the project level, the intent is to determine whether the project achieved its objective and addressed the crashes and/or risk factors that were the impetus of the project (Atlanta Regional Commission 2022). This section focuses on crash-based performance measures, but the same methods can generally be applied using other noncrash-based performance measures. For crash-based project-level evaluations, the simple before-after method and test of proportions are appropriate. While more rigorous methods can produce more reliable results, this is generally not necessary at the project level (Atlanta Regional Commission 2022).
The simple before-after method compares the performance measure before and after the implementation of the project, typically excluding the year(s) of construction. Project-level evaluations should include 12-month increments to avoid seasonal impacts, and it is preferred to use the same duration for the before-after period (e.g., 3 years before and 3 years after). If the duration of the before-and-after periods is different, it is important to normalize the analysis on a per-year basis. Note this method does not account for changes in traffic volume or other factors from the before to the after period, which can impact the results (Gross 2017).
Table 14 provides an example of a crash-based project-level evaluation using the simple before-after method. In this example, the study period is 7 years, which includes 3 years before and after implementation and excludes the one year of implementation. For this example, the sample data indicates a 6.7% increase in total crashes and a 60% reduction in target crashes (Gross 2017).
The test-of-proportions method compares the proportion of target crashes to total crashes before and after implementation (Gross 2017). For example, an agency may compare the proportion of weather-related crashes before and after installation of a roadway weather information system at a bridge. This method is particularly useful and generally more appropriate than the
Table 14. Example crash-based project-level evaluation using simple before-after method.
| Crash Category | Crashes Before Implementation (3-year period) | Crashes During Implementation (1-year period) | Crashes After Implementation (3-year period) | Change (before-after) | % Change |
|---|---|---|---|---|---|
| Total crashes | 15 | Excluded | 16 | −1 | 6.7% increase |
| Target crashes | 5 | Excluded | 2 | 3 | 60.0% reduction |
Table 15. Example crash-based project-level evaluation using test-of-proportions method.
| Total Crashes Before (3-year period) | Target Crashes Before (3-year period) | Proportion of Target to Total Crashes Before | Crashes During Implementation (1-year period) | Total Crashes After (3-year period) | Target Crashes After (3-year period) | Proportion of Target to Total Crashes After |
|---|---|---|---|---|---|---|
| 18 | 12 | 0.67 | Excluded | 9 | 3 | 0.33 |
simple before-after method, when the traffic volume changes from the before to the after period. Similar to the simple before-after method, evaluations should include 12-month increments to avoid seasonal impacts, and it is preferable to include at least 3 years before and after implementation in the study period. Unlike the simple before-after method, the test-of-proportions method is not affected by different durations of before-and-after periods.
Table 15 provides an example of a crash-based project-level evaluation using the test-of-proportions method. Similar to the prior example, the study period is 7 years, including 3 years before implementation, 3 years after implementation, and excluding the implementation year (Gross 2017). The target crashes will depend on the project of interest. In this example, the proportion of target to total crashes is 0.67 before implementation and 0.33 after implementation. The result is a difference of −0.33 from the before to the after period, indicating a 50% reduction in the proportion of target crashes.
For simple before-after, crash-based project-level evaluations, it is desirable to use a minimum of 3 full years of before-data and 3 full years of after-data (Atlanta Regional Commission 2022). There may be a need for more years of data to better understand the long-term averages. This is particularly true for projects that target rare or seemingly random crash types (e.g., a BrTS). While a longer study period generally provides a larger sample of crashes for analysis, it also increases the chances for other changes over time (e.g., roadway or operational changes, driver behavior, or vehicle fleet) (Gross 2017). As such, agencies should balance the study period with the potential for other changes over time.
Countermeasure-level evaluations focus on the effectiveness of similar projects, often with the intent to develop a crash modification factor (CMF). Countermeasure-level evaluations inform future decisions, particularly when estimating the expected benefits of other similar
proposed projects. As evaluations demonstrate the effectiveness of countermeasures, there is an opportunity for agencies to integrate those measures in planning and design policies rather than implementing them on an as-needed basis. The following are general considerations related to performance measures, methodology, and data requirements for countermeasure evaluations.
If the intent is to develop a CMF, agencies should use crash-based measures for countermeasure-level evaluations. Crash-based measures include the frequency and severity of crashes. It is often useful to evaluate the countermeasure effects separately for fatal and injury (FI) crashes and property damage only (PDO) crashes. This supports future benefit-cost analyses where the net benefits are based on CMFs by severity. There is also the option to evaluate the effect of countermeasures by crash type. This can be useful to assess programs or countermeasures that target specific crash types.
Crash-based before-after studies are generally appropriate to evaluate countermeasures and develop CMFs. If the intent is to develop a CMF for use in future decisions, then agencies should use more reliable methods such as the Empirical Bayes before-after method or before-after with comparison group method for countermeasure-level evaluations. As shown in Table 16, the more reliable methods account for potential sources of bias, such as regression-to-the-mean, changes in traffic volume, the nonlinear relationship between crashes and traffic volume, and other changes over time (Gross 2017).
In before-after studies, some change occurs during the study period (i.e., projects are implemented). The simple methods focus on the safety performance at the treated locations over time. The more reliable methods incorporate information from a comparison or reference group (i.e., untreated locations) to adjust for other changes over time that affect safety performance. As shown in Table 16, the Empirical Bayes before-after method is the more reliable method for developing quality CMFs because it can properly account for potential sources of bias (Gross 2017). Refer to FHWA’s A Guide to Developing Quality CMFs (Gross, Persaud, and Lyon 2010) and FHWA’s Highway Safety Improvement Program (HSIP) Evaluation Guide (Gross 2017) for more information on the simple, comparison group, and Empirical Bayes before-after methods, including equations and templates to implement the methods.
Table 16. Overview of before-after methods.
| Method | Accounts for Regression - to-the-Mean | Accounts for Changes in Traffic Volume | Accounts for Nonlinear Relationship Between Crashes and Traffic Volume | Accounts for Other Changes Over Time |
|---|---|---|---|---|
| Simple | — | — | — | — |
| Simple with linear traffic volume correction | — | • | — | — |
| Comparison group | — | • | — | • |
| Empirical Bayes | • | • | • | • |
Note: – = Does not account for the given issue.
The primary data requirements for countermeasure evaluations include the following:
To increase the reliability of countermeasure evaluations, there is a need to include multiple similar projects in the analysis rather than a single project (Gross 2017). While countermeasure-level evaluations could be based on a few sites, this will typically result in a large standard error and lower confidence in the result. When grouping multiple projects, there is a need to consider the consistency among projects (e.g., strategies and site characteristics) and the potential for different effects under different conditions. Different combinations of countermeasures and site characteristics along with variations in vehicles and driver behavior can result in different countermeasure effects.
Sample size is another consideration in data collection and countermeasure evaluation. The sample size necessary to obtain statistically significant results depends on the desired level of confidence and the magnitude of the countermeasure effect. As the desired level of confidence increases (e.g., from 90% to 95%), so does the minimum sample size. Similarly, a larger sample is necessary to detect smaller changes in safety (e.g., a larger sample is necessary to detect a 10% change in crashes compared to the sample needed to detect a 40% change in crashes).
Table 17 presents minimum sample sizes for a before-after with comparison group study that can serve as a conservative sample size estimate for an Empirical Bayes study. The table shows the minimum number of crashes to detect different levels of effectiveness (i.e., best guess at how effective the countermeasure will be) at common levels of significance. For example, if the expected level of effect is a 20% reduction in crashes and the desired level of significance is 0.10 (90% confidence), then the minimum number of crashes is 193 (Gross 2017). These estimates assume the number of comparison sites is equal to the number of treated sites, and the duration of the before-and-after periods are equal. This indicates a minimum sample of 193 crashes in both the before-and-after periods for both the treatment and comparison groups. For scenarios not listed in the table, refer to the spreadsheet template in FHWA’s Highway Safety Improvement Program (HSIP) Evaluation Guide (Gross 2017). Do not use linear interpolation or extrapolation
Table 17. Minimum sample size (number of crashes) for before-after with comparison group method.
| Expected Level of Effect (% Change) | 0.10 Level of Significance (90% Confidence) | 0.05 Level of Significance (95% Confidence) |
|---|---|---|
| 10 | 1,155 | 1,858 |
| 20 | 193 | 279 |
| 30 | 67 | 95 |
| 40 | 29 | 41 |
to estimate sample sizes for other levels of significance or levels of effect from the numbers in the table because the trends are nonlinear (Gross 2017).
Once the minimum sample size is determined in terms of the number of crashes, it is possible to determine the number of years needed to accumulate the minimum crashes given the sample of projects available for countermeasure evaluation. In general, the study period should include at least 3 years before and after implementation and exclude the implementation period from the analysis. It is possible to use different durations for the before-and-after periods (e.g., 5 years before and 3 years after); however, the before-and-after periods should represent 12-month increments to avoid seasonal bias. For some countermeasures, it may be difficult to collect the minimum sample size due to a low number of crash occurrences. Increasing the duration of the study period is one option to increase the number of crashes for analysis; however, this can introduce bias if other changes occur over time as discussed previously. When other options to increase the sample size are unavailable or undesirable, an alternative is to accept a lower level of significance (e.g., 0.15 or 0.20) as opposed to the typical value of 0.05 or 0.10 as an interim step (Gross 2017).
Program-level evaluations focus on the overall program and subprograms with the ultimate goal of improving the efficiency of projects, programs, and policies. These types of evaluations are particularly useful for strategies or programs that are not specific to a single location (e.g., statewide permitting policy on specialized loads). The following are general considerations related to performance measures, methodology, and data requirements for program evaluations.
Crash-based program-level evaluations measure effectiveness by changes in the frequency, severity, and rate of crashes at the system level. This can occur at the national level, statewide level, program or subprogram level, or jurisdiction level (region or county). In general, it is appropriate to include all projects associated with a given program for the program-level evaluation. Similar to countermeasure-level evaluations, it is not appropriate to exclude certain sites or countermeasures because the results are not favorable. This can bias the program-level evaluation results and lead to misinformed future investment decisions.
There are various methods to assess program effectiveness, including the number of BrTS crashes and the rate of BrTS crashes per 100 million truck-miles traveled. Agencies can compare
these basic measures over time and compare against factors such as the number of related projects and level of investment (i.e., dollars spent). Figure 53 shows an example comparison of statewide BrTS crashes over time versus the level of investment in BrTS mitigation strategies.
The primary data requirements for program evaluations include the performance measure of interest (e.g., BrTS crashes by severity) and general information for each project or countermeasure within the program. Project- and countermeasure-level details could include the applicable program(s) that funded the activity, applicable focus areas (e.g., on-bridge strikes, under-bridge strikes with overhead structure, under-bridge strikes with pier or support, tunnel strikes), and final costs to complete the project.