Understanding the causal factors of roadway crashes is critical for safety improvement. Over the past few decades, researchers have developed various models to quantify the relationship between crash frequency/severity and contributing factors to gain knowledge about how crashes occur. Although such safety analyses have improved with time, most existing safety analysis methods rely on collision databases with well-known limitations (e.g., timeliness, completeness, and accuracy). Further, this method is inherently reactive and hindered by the need for the accrual of fatality and injury records over years to identify unsafe locations (Zheng et al. 2021). Real-time influences attributable to weather, driving behavior, infrastructure condition, and associated interactions have not been fully accounted for in traditional safety analysis.
Traditional data sources for traffic safety analysis usually include crash reports, roadway geometry characteristics, traffic characteristics, and weather conditions. Typical features of crash data include time, location, severity, and type (Li et al. 2020). These crash records are linked with roadway geometry data via crash location and with weather conditions via crash occurrence time. Typical traffic characteristics in crash databases include annual average daily traffic (AADT) and its derivatives. More comprehensive datasets, such as those available from the Highway Safety Information System hosted by the U.S. DOT, leverage more detailed policy reports and can include information on involved vehicles, occupants, and pedestrians but are limited to certain states.
Traditional data sets may include but are not limited to the following:
The International Road Assessment Programme (iRAP) is the internationally recognized protocol to evaluate safety and performance planning. The iRAP Star Ratings are one of the five iRAP protocols designed to collect road attributes on a particular road segment (iRAP 2022). The Star Ratings protocol is intended to assess infrastructure-related risk based on crash modification factors, considering the likelihood and severity of individual user crashes with different infrastructure features. Roads where the probability of a serious traffic collision with a fatal outcome is very high are rated with 1 star, while roads where the probability of a fatal crash is near zero are rated with 5 stars (Brkić et al. 2022). The assessments are made without considering crash history data.
The iRAP system includes roadway data in the following categories:
The iRAP system limitations include:
Traditional safety analysis is dependent on the expertise, tools, and resources of transportation organizations. Many agencies, especially at the local and regional levels, lack the resources to conduct a rigorous analysis of existing conditions and project evaluation.
Traditional safety analysis, focused primarily on crash history (with some additional data sets such as roadway inventory and traffic volume), falls under several primary categories. These include, but are not limited to hot spot analysis, systemic analysis, project/corridor analysis, and the use of surrogate measures.
Traditional network screening techniques often rely on site analysis to identify locations for potential safety improvement investments. These techniques focus primarily on specific locations—often called hot spots—based primarily on a history of fatal and serious injury crashes. However, evidence indicates that fatal and serious injury crashes are widely distributed across roadway systems, and very few individual locations (especially in rural areas and on non-state highways) exhibit a history of multiple fatal and serious injury crash events. If transportation agencies limit safety investments to high-crash locations (hot spots), it is difficult for them to meet jurisdiction-wide safety performance goals (Preston et al. 2013).
Beyond hot spot analysis, it is important to proactively identify the potential for future crashes by incorporating a systemic approach in the safety management process. This involves widely implementing safety improvements based on high-risk roadway features along determined segment lengths correlated with specific or contributing circumstances in severe crash types. This approach allows for more comprehensive safety planning and implementation by considering future risk as well as crash history. It is particularly useful when coupled with low-cost safety improvements that can be installed at many locations. The systemic approach to safety remains a data-driven process that involves analytical techniques and is still based on crash history data. This helps transportation organizations understand the relationship between high-severity crash events and contributing factors and helps identify sites for potential safety improvement. Additionally, the recommended projects stemming from systemic analysis tend
to be smaller in scope and complexity than those at hot spots. It supplements traditional site analysis and provides a more comprehensive and “proactive” approach to preventing the most severe crashes (Preston et al. 2013).
Project or corridor analysis extends the scope of traditional hot spot analysis to larger segments of roadways or corridors, which can include miles of highways and several intersections. This method identifies and addresses safety issues over extended areas, providing a holistic view of the safety performance across a broader network. It utilizes crash prediction models similar to those used in hot spot analysis, as detailed in Part C of the Highway Safety Manual (HSM). By integrating various datasets, such as crash history, roadway inventory, and traffic volume, corridor analysis helps to understand the underlying causes of crashes and identify appropriate countermeasures. The resulting safety improvement plans target the entire corridor, focusing on systemic enhancements that improve safety across the corridor rather than at isolated points.
In addition to crash data, surrogate measures can be used to evaluate the safety of projects. Examples of surrogate measures include conflict analysis and the use of star ratings in the iRAP models.
Surrogate measures are particularly useful in areas with sparse crash data and allow for proactive safety assessments and interventions, helping to prevent crashes before they occur. They complement traditional crash data analysis, offering a more comprehensive understanding of road safety issues.
Although traditional safety analyses have provided safety insights, they are not without challenges and limitations. Statistical models commonly used in safety studies are often based on a theory that is not perfectly reflected by traditional safety datasets (including crash data). Missing and inconsistent data, as well as variations in unobserved effects due to economic, sociodemographic, and societal norms, and vehicle characteristics can all amplify the potential bias in traditional statistical models (Mannering et al. 2020). Although endogeneity models (Bhat et al. 2014) and heterogeneity models (Mannering et al. 2016) have been developed to extend traditional safety models by using advanced statistical and econometric methods, the modelʼs capability of dealing with large-scale and consistently collected data across the entire road network still needs to be improved. Traditional statistical safety models such as those used in the first edition of
the HSM (2010) utilize limited data for maximum applicability. But these methods have several inherent data biases that arise with limited data and missing datasets, inconsistency of measurement, and variations in unobserved effects.
Limitations of traditional safety analysis techniques include:
When introducing new tools and datasets, their applicability must be addressed so practitioners can select the appropriate tools and analysis methods. These new tools, methods, and datasets also allow for opportunities to discover new research questions. Depending on the safety-analysis-related question, traditional methods and tools may be sufficient. For example, applying the HSM SPFs with traditional datasets such as crash history, traffic volume, and roadway inventory elements can address many analysis needs. But the HSM is reliant on a complete set of these data, which in some cases are not available to an agency practitioner or researcher. This is one potential application of AI/ML tools for analysis.
Table 1 introduces a few examples of safety-related research topics and the applicability of emerging analysis methods, tools, and datasets. It includes a relative level of effort for each and a short description of the potential benefits of using AI/ML tools and/or analyzing big data.
Research Topic |
Data Type |
Dataset |
AI/ML Applicability |
Big Data Applicability |
LOE with AI/ML, big data |
LOE without AI/ML, big data |
Benefits |
|---|---|---|---|---|---|---|---|
Expected predicted future crash risk (HSM method) |
Crash history |
State crash repository |
N/A |
N/A |
N/A |
N/A |
Base inventory for all analyses |
Traffic volume |
Turning movement counts |
N/A |
Yes: connected vehicle outputs |
N/A |
N/A |
ID safety risks by movement at intersections |
|
Roadway inventory streetlights, signs) |
ODOT digital video log |
Yes: object ID |
Low |
High |
Expanded inventory |
||
Roadway inventory, attributes (lane marking, lane width) |
LiDAR |
Yes |
Yes: point data |
Low |
High |
Expanded inventory |
|
Proactive safety needs ID* |
Conflicts (unreported crashes, near-misses) |
Video/LiDAR analytics |
Yes: conflict ID |
Yes: trajectories |
Low |
N/A |
ID safety risks before crashes occur |
Intersection vehicle-pedestrian safety risk |
Turning vehicle speeds, trajectories |
Video/LiDAR analytics |
N/A |
Yes: object trajectory/speed |
Medium |
N/A |
ID safety risks by movement at intersections |
Road user behaviors |
Acceleration/deceleration, speed, heading, seat belt use |
Connected vehicle outputs |
N/A |
Yes: connected vehicle outputs |
Medium |
N/A |
Behavioral safety; ID proactive hot spots |
Road surface maintenance needs |
Road surface conditions |
Edge devices |
Yes: pattern recognition |
Yes |
Low |
High |
Improved condition inventory |
ID: identification; LOE: level of effort |
|||||||