Seattle DOT, Flickr, CC BY-NC 4.0Traffic safety is a major public health problem in the United States, where crashes kill approximately 40,000 people annually (1). To address this challenge, research is needed to improve understanding of crash mechanisms and to develop proactive, data-driven safety strategies. The National Cooperative Highway Research Program’s [NCHRP’s] Project 17-100 is one such effort. The project and resulting report, NCHRP Research Report 1152: Leveraging Artificial Intelligence and Big Data to Enhance Safety Analysis: A Guide, advance the use of AI and machine learning with big and unconventional data to support safe system and modal priority decision making as well as performance tracking.
Research Design and Phases
The project was organized in two phases. Phase I focused on the literature review and data readiness (i.e., identifying sources and obtaining datasets). The team conducted a comprehensive review of AI and machine learning frameworks for traffic safety analysis and the big and unconventional data applicable to that analysis. The review examined the following four areas:
- Limitations of traditional crash-based evaluations,
- Emerging data sources with preparation procedures and attributes,
- AI and big-data analytics that are suited to multisource safety problems and their evaluation, and
- Trends mined from the literature to identify promising applications.
Building on these findings, the team specified source-data needs and attributes critical to their adoption, including spatial and temporal coverage, collection frequency, granularity, accessibility, and cost. Guided by this foundation, the team engaged stakeholders through surveys to the TRB, ASCE, and ITE communities, consulted data vendors on availability and use conditions, and assembled working datasets for method development and validation, including connected-vehicle data, mobile and infrastructure lidar, street-level and roadside video for trajectory extraction, loop-detector data (i.e., in-road traffic sensors measuring counts and speed), and street-view imagery (Figure 1).



FIGURE 1 Examples of data sources used in the study include (a) connected vehicle trajectories, (b) lidar point clouds showing roadway geometry and surrounding features, and (c) dash cam video.
During Phase II, the team developed a practical framework and task-appropriate AI and machine-learning models, established quality assurance procedures aligned with relevant standards, and executed pilot projects to test transferability and value for practice.
Pilot Demonstrations
Before finalizing the integrated safety analytics system (including the data-processing framework, AI-based models, and quality assurance procedures), two pilots validated the approach and clarified implementation needs. In one, conducted with the Oregon Department of Transportation, an imagery-based, computer-vision data-processing pipeline detected and located streetlight luminaires (i.e., lighting fixtures), producing a geospatial inventory layer of streetlight assets that supports nighttime safety assessment and maintenance planning while reducing manual inventory effort. In the other pilot, located in Bellevue, Washington, intersection video was processed to extract vehicle turning speeds and trajectories, generating behavior-based indicators that complement crash records for operational and design decisions related to pedestrian and bicyclist safety. The pilots refined end-to-end data management and processing workflows, confirmed model performance against agency needs, and identified operational requirements for deployment.
Seattle DOT, Flickr, CC BY-NC 2.0Deliverables and Agency Application
Informed by the pilots, the project produced a portfolio of applied tools that operationalize emerging datasets for safety management. These applications include the following:
- Automated streetlight inventory from street-view imagery;
- Turning speeds and trajectories from intersection video analytics;
- Inference of vehicle volumes and types, classified by length, from single loop detectors;
- Turning-movement estimation from connected-vehicle telemetry;
- Lane-marking and width extraction from lidar;
- Traffic-sign detection and recognition from roadway video logs;
- Multiscale pedestrian detection from mounted surveillance cameras; and
- Road surface condition analytics from on-site edge computing devices.
The primary deliverable was a practitioner-focused user’s guide, NCHRP Research Report 1152, that documents replicable workflows from data acquisition and labeling through model development, training, evaluation, and deployment. The guide also provides detailed descriptions of the pilots and tools. Agencies can use the guide to translate safety questions into appropriate datasets and model designs, adopt a planning-first pathway for machine-learning, and implement tested workflows across the full data life cycle from acquisition through deployment. The tools and methods extend beyond asset inventories and behavior indicators to support systemic risk screening, near-real-time monitoring, predictive safety modeling, and program evaluation across networks and modes. The guide’s procedures promote reproducibility, governance, and performance monitoring, enabling scale-up from pilots to nationwide practice and strengthening investment decisions. Collectively, this research demonstrates the practical use of AI and machine learning to address traffic safety problems and provides a repeatable path from proof of concept to sustained implementation.
Acknowledgment
The authors thank the NCHRP project team—including Fred Mannering (University of South Florida), Venky Shankar (Texas Tech University), Brian Chandler (DKS Associates), and Shuyi Yin (University of Washington)—for their valuable contributions to this research.