Dropdown items
My Academies

Personal Library

Account settings

Leveraging Artificial Intelligence and Big Data to Enhance Safety Analysis: A Guide (2025)

Chapter: 2 Traditional Safety Evaluations

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 100
Search this publication

Page 4 Bookmark

Suggested Citation: "2 Traditional Safety Evaluations." National Academies of Sciences, Engineering, and Medicine. 2025. Leveraging Artificial Intelligence and Big Data to Enhance Safety Analysis: A Guide. Washington, DC: The National Academies Press. doi: 10.17226/29098.

CHAPTER 2
Traditional Safety Evaluations

Overview of Traditional Safety Evaluations

Understanding the causal factors of roadway crashes is critical for safety improvement. Over the past few decades, researchers have developed various models to quantify the relationship between crash frequency/severity and contributing factors to gain knowledge about how crashes occur. Although such safety analyses have improved with time, most existing safety analysis methods rely on collision databases with well-known limitations (e.g., timeliness, completeness, and accuracy). Further, this method is inherently reactive and hindered by the need for the accrual of fatality and injury records over years to identify unsafe locations (Zheng et al. 2021). Real-time influences attributable to weather, driving behavior, infrastructure condition, and associated interactions have not been fully accounted for in traditional safety analysis.

Traditional Datasets

Traditional data sources for traffic safety analysis usually include crash reports, roadway geometry characteristics, traffic characteristics, and weather conditions. Typical features of crash data include time, location, severity, and type (Li et al. 2020). These crash records are linked with roadway geometry data via crash location and with weather conditions via crash occurrence time. Typical traffic characteristics in crash databases include annual average daily traffic (AADT) and its derivatives. More comprehensive datasets, such as those available from the Highway Safety Information System hosted by the U.S. DOT, leverage more detailed policy reports and can include information on involved vehicles, occupants, and pedestrians but are limited to certain states.

Traditional data sets may include but are not limited to the following:

Crash reports
- Location, date, and time
- Injury severities. Severity is typically defined by the “KABCO” scale, which includes:
- K (Killed): Fatal injuries leading to death within 30 days of the crash.
- A (Incapacitating): Severe injuries that prevent normal activities and may require hospitalization, such as broken limbs or severe cuts.
- B (Non-Incapacitating): Visible injuries that do not prevent normal activities, such as bruises or minor cuts.
- C (Possible Injury): Any potential injury reported or claimed that is not visible, such as whiplash or soreness.
- O (No Injury): No injuries are reported or apparent.
- Lighting conditions (night/day, dawn/dusk, or dark with streetlights)
- Intersection traffic control (traffic signals, stop signs, yield signs, uncontrolled)

Page 5 Bookmark

- Human factors (age, impairment, distraction, seat belt use, occupant information)
- Collision type (run off the road, head-on, angle, rear-end, etc.)
- Vehicle make and model
- Crash contributors (failure to control speed, speeding, distraction, failure to yield right-of-way, etc.)
- Pavement condition (wet, snow, ice)
Roadway geometrics and attributes
- Median, type, and width
- Shoulder, type (earth, gravel, paved), and width
- Surface, type
- Striping, condition (poor, fair, good)
- Number of lanes and widths
- Functional classification (interstate, principal arterial, major collector, etc.)
- AADT
- Truck percentage
- Speed limit
- Adjacent land use
- Grade
- Curvature, direction, and degrees
Weather data
- Temperature
- Precipitation (rain, snow, sleet)
- Wind speed/gust
Aerial imagery
Demographics, census, and socioeconomic data

International Road Assessment Programme Data Elements

The International Road Assessment Programme (iRAP) is the internationally recognized protocol to evaluate safety and performance planning. The iRAP Star Ratings are one of the five iRAP protocols designed to collect road attributes on a particular road segment (iRAP 2022). The Star Ratings protocol is intended to assess infrastructure-related risk based on crash modification factors, considering the likelihood and severity of individual user crashes with different infrastructure features. Roads where the probability of a serious traffic collision with a fatal outcome is very high are rated with 1 star, while roads where the probability of a fatal crash is near zero are rated with 5 stars (Brkić et al. 2022). The assessments are made without considering crash history data.

The iRAP system includes roadway data in the following categories:

Roadside. Roadside objects and their distances to the road define the risk.
Midblock. A road infrastructure feature that separates the two opposing traffic flows for both divided and undivided carriageways. The feature can provide a level or running surface free from defects that may adversely affect the vehicleʼs path.
Intersection. The presence and type of intersection define the risk. This depends on the quality of the intersection design (merge lane with adequate length, safe deflection angles, presence of signs and proper markings, proper sight distance, and presence of facilities for pedestrians and cyclists) and the presence of warning signs and markings.
Vulnerable road user (VRU) facilities. The presence of purpose-built pedestrian crossing facilities on the inspected roadway segment and on the intersecting side road, which may also include purpose-built facilities for bicyclists.

Page 6 Bookmark

The iRAP system limitations include:

Human behavior. The iRAP system cannot fully account for the unpredictable nature of human behavior and interactions on the road that can affect the accuracy of safety assessments.
Subjectivity in local adjustments. While iRAP provides structured assessments, a well-rounded assessment for the implementation of safety improvements requires local expertise on crash history, introducing subjectivity based on varying levels of expertise and resources.
Other road safety factors. The focus of iRAP on infrastructure and speed management may limit the focus on all aspects of road safety, such as vehicle safety standards, driver education, and law enforcement.
Data constraints. The effectiveness of iRAP systems can be limited by the quality and availability of data and the technological capabilities of varying practitioners for data collection and analysis.
Resource limitations. The cost and resource requirements for implementing iRAP recommendations can be significant, especially in areas with limited resources.

Traditional Analysis Techniques

Traditional safety analysis is dependent on the expertise, tools, and resources of transportation organizations. Many agencies, especially at the local and regional levels, lack the resources to conduct a rigorous analysis of existing conditions and project evaluation.

Traditional safety analysis, focused primarily on crash history (with some additional data sets such as roadway inventory and traffic volume), falls under several primary categories. These include, but are not limited to hot spot analysis, systemic analysis, project/corridor analysis, and the use of surrogate measures.

Network Screening for Hot Spots

Traditional network screening techniques often rely on site analysis to identify locations for potential safety improvement investments. These techniques focus primarily on specific locations—often called hot spots—based primarily on a history of fatal and serious injury crashes. However, evidence indicates that fatal and serious injury crashes are widely distributed across roadway systems, and very few individual locations (especially in rural areas and on non-state highways) exhibit a history of multiple fatal and serious injury crash events. If transportation agencies limit safety investments to high-crash locations (hot spots), it is difficult for them to meet jurisdiction-wide safety performance goals (Preston et al. 2013).

Systemic Approach to Safety

Beyond hot spot analysis, it is important to proactively identify the potential for future crashes by incorporating a systemic approach in the safety management process. This involves widely implementing safety improvements based on high-risk roadway features along determined segment lengths correlated with specific or contributing circumstances in severe crash types. This approach allows for more comprehensive safety planning and implementation by considering future risk as well as crash history. It is particularly useful when coupled with low-cost safety improvements that can be installed at many locations. The systemic approach to safety remains a data-driven process that involves analytical techniques and is still based on crash history data. This helps transportation organizations understand the relationship between high-severity crash events and contributing factors and helps identify sites for potential safety improvement. Additionally, the recommended projects stemming from systemic analysis tend

Page 7 Bookmark

to be smaller in scope and complexity than those at hot spots. It supplements traditional site analysis and provides a more comprehensive and “proactive” approach to preventing the most severe crashes (Preston et al. 2013).

Project/Corridor Analysis

Project or corridor analysis extends the scope of traditional hot spot analysis to larger segments of roadways or corridors, which can include miles of highways and several intersections. This method identifies and addresses safety issues over extended areas, providing a holistic view of the safety performance across a broader network. It utilizes crash prediction models similar to those used in hot spot analysis, as detailed in Part C of the Highway Safety Manual (HSM). By integrating various datasets, such as crash history, roadway inventory, and traffic volume, corridor analysis helps to understand the underlying causes of crashes and identify appropriate countermeasures. The resulting safety improvement plans target the entire corridor, focusing on systemic enhancements that improve safety across the corridor rather than at isolated points.

Use of Surrogate Measures

In addition to crash data, surrogate measures can be used to evaluate the safety of projects. Examples of surrogate measures include conflict analysis and the use of star ratings in the iRAP models.

Conflict analysis: Conflict analysis studies near-miss incidents or traffic conflicts in which there is a high probability of a crash occurring if no evasive action is taken. Tools such as video surveillance, automated conflict detection systems, and simulation models identify and analyze conflicts, helping to address potential safety issues before they result in crashes.
Star ratings in iRAP models: The star rating system used by iRAP to assess the safety of road infrastructure is based on specific safety features. Roads are evaluated based on factors such as lane width, shoulder design, roadside hazards, and pedestrian facilities and are assigned a star rating from 1 (least safe) to 5 (most safe). These ratings help identify high-risk road segments and prioritize them for safety improvements. iRAP models provide a standardized way to evaluate and compare road safety across different regions and countries, facilitating global road safety improvements.

Surrogate measures are particularly useful in areas with sparse crash data and allow for proactive safety assessments and interventions, helping to prevent crashes before they occur. They complement traditional crash data analysis, offering a more comprehensive understanding of road safety issues.

Analysis Challenges

Although traditional safety analyses have provided safety insights, they are not without challenges and limitations. Statistical models commonly used in safety studies are often based on a theory that is not perfectly reflected by traditional safety datasets (including crash data). Missing and inconsistent data, as well as variations in unobserved effects due to economic, sociodemographic, and societal norms, and vehicle characteristics can all amplify the potential bias in traditional statistical models (Mannering et al. 2020). Although endogeneity models (Bhat et al. 2014) and heterogeneity models (Mannering et al. 2016) have been developed to extend traditional safety models by using advanced statistical and econometric methods, the modelʼs capability of dealing with large-scale and consistently collected data across the entire road network still needs to be improved. Traditional statistical safety models such as those used in the first edition of

Page 8 Bookmark

the HSM (2010) utilize limited data for maximum applicability. But these methods have several inherent data biases that arise with limited data and missing datasets, inconsistency of measurement, and variations in unobserved effects.

Limitations of traditional safety analysis techniques include:

Reliance on historical crash data. Traditional methods often prioritize sites for safety improvements solely based on historical crash data, which focus on crash frequency and severity. This approach may overlook sites with lower traffic volumes but higher relative risk. It can be biased by short-term fluctuations in crash data, a phenomenon known as regression to the mean.
Reactive nature. The crash-based approach is inherently reactive, relying on long periods of observational data to identify and address safety issues. This means that measures are typically taken after crashes have already occurred, rather than proactively identifying and mitigating potential safety concerns before they lead to crashes.
Volume bias. Methods that prioritize sites based on crash frequency can inadvertently favor high-volume sites, as these are more likely to have a higher number of crashes due to the amount of traffic. Even when crash rates (crashes per vehicle mile traveled) are used to account for traffic volume, this can disadvantage low-volume sites with a high relative risk.
Complexity of safety problems at high-crash sites. High-crash sites, often identified through these traditional methods, may require complex and expensive reconstructions for significant safety improvements. The complexity of this challenge stems from various considerations: (1) limited understanding of causality of factors, (2) geographic needs in investment (e.g., urban versus rural), and (3) user needs (vulnerable versus other traveling populations). This challenges the efficient allocation of safety funds to achieve the most substantial impact.
Linear assumptions. Traditional safety analysis methods, such as crash rates, often assume a linear relationship between crash frequency and traffic volume, a simplification that doesnʼt hold across different road types. While the HSM promotes the use of safety performance functions (SPFs) to capture the nonlinear nature of these relationships, challenges remain. These include issues with data quality and availability, the need for context-specific variables, the complexity of developing and implementing SPFs, evolving roadway conditions, and the integration of SPFs with other methods. Despite advancements, these limitations highlight the need for ongoing refinement to achieve the most accurate and effective safety analysis.
Limited scope. While the HSM provides a comprehensive framework for evaluating road safety quantitatively, its effectiveness is contingent on the availability of detailed data and the ability to accurately model the complex interactions involved in traffic crashes. It also requires substantial technical expertise to implement effectively.

When to Consider Advanced AI/ML Tools for Analysis

When introducing new tools and datasets, their applicability must be addressed so practitioners can select the appropriate tools and analysis methods. These new tools, methods, and datasets also allow for opportunities to discover new research questions. Depending on the safety-analysis-related question, traditional methods and tools may be sufficient. For example, applying the HSM SPFs with traditional datasets such as crash history, traffic volume, and roadway inventory elements can address many analysis needs. But the HSM is reliant on a complete set of these data, which in some cases are not available to an agency practitioner or researcher. This is one potential application of AI/ML tools for analysis.

Table 1 introduces a few examples of safety-related research topics and the applicability of emerging analysis methods, tools, and datasets. It includes a relative level of effort for each and a short description of the potential benefits of using AI/ML tools and/or analyzing big data.

Page 9 Bookmark

Table 1. The applicability and level of effort benefits of using AI/ML and big data.
Research Topic	Data Type	Dataset	AI/ML Applicability	Big Data Applicability	LOE with AI/ML, big data	LOE without AI/ML, big data	Benefits
Expected predicted future crash risk (HSM method)	Crash history	State crash repository	N/A	N/A	N/A	N/A	Base inventory for all analyses
	Traffic volume	Turning movement counts	N/A	Yes: connected vehicle outputs	N/A	N/A	ID safety risks by movement at intersections
	Roadway inventory streetlights, signs)	ODOT digital video log	Yes: object ID		Low	High	Expanded inventory
	Roadway inventory, attributes (lane marking, lane width)	LiDAR	Yes	Yes: point data	Low	High	Expanded inventory
Proactive safety needs ID*	Conflicts (unreported crashes, near-misses)	Video/LiDAR analytics	Yes: conflict ID	Yes: trajectories	Low	N/A	ID safety risks before crashes occur
Intersection vehicle-pedestrian safety risk	Turning vehicle speeds, trajectories	Video/LiDAR analytics	N/A	Yes: object trajectory/speed	Medium	N/A	ID safety risks by movement at intersections
Road user behaviors	Acceleration/deceleration, speed, heading, seat belt use	Connected vehicle outputs	N/A	Yes: connected vehicle outputs	Medium	N/A	Behavioral safety; ID proactive hot spots
Road surface maintenance needs	Road surface conditions	Edge devices	Yes: pattern recognition	Yes	Low	High	Improved condition inventory
ID: identification; LOE: level of effort