Video-based systems (Figure 6) are non-intrusive technologies that can provide remote viewing, offer surveillance capabilities, and collect several data outputs. These systems are increasingly used at signalized intersections for traffic detection and signal control, and in recent years, for motorized and non-motorized volume data collection. A typical video processing system consists of one or more fixed or PTZ cameras and a built-in or external processor for analyzing video images and translating them into traffic flow information (Klein et al., 2006). The most common types of cameras are (a) standard video optical monochrome or color cameras; and (b) 360-degree cameras, which can view all intersection approaches. The 360-degree cameras may also be used in combination with advance cameras or thermal sensors. Thermal cameras are described in Chapter 6: Infrared Sensors.
In some systems, the processors are integrated into the cameras, allowing them to analyze video data on-site. In other systems, processing may occur in signal cabinets or other dedicated processing units (e.g., external servers). Video-based systems use a variety of methods to detect and count vehicles, such as artificial intelligence (AI) and machine learning, video image processing, and product-specific algorithms. In general, these methods analyze changes across groups of pixels between successive frames, disregarding gray level or color changes in the
stationary background. When changes are detected, the processor calls the signal controller. The processor can be configured to output signals that simulate a loop detection system, including pulse, presence, delay, and extension signals (Balke et al., 2023).
The two main counting methods from videos are region of interest (ROI) and line of interest (LOI) counting. ROI counting estimates the number of vehicles in a selected region at a specific time, while the LOI method counts vehicles crossing a designated detecting line (Xiong et al., 2017). The ROI method involves setting up virtual detectors, also known as detection zones, at selected areas within a video frame where vehicle presence is monitored continuously. By tracking vehicles as they enter, move through, or exit this area, ROI counting provides an estimate of the number of vehicles present at a specific moment. For example, Figure 7 shows three sets of virtual detectors placed at different locations along an intersection approach to count and detect vehicles (Wu et al., 2021). The red detectors at the stop bar are primarily used to count vehicles, the blue detectors placed before the stop bar collect occupancy data, and the advance green detectors are configured to collect occupancy data. Some manufacturers recommend placing the red (volume) detectors after the stop bar to detect and count moving vehicles as they enter the intersection.
Figure 8 and Figure 9 illustrate examples of virtual detectors configured to detect and count vehicles. When a vehicle passes over a detector, the latter changes color (e.g., from black to green in Figure 8, and from yellow to green in Figure 9), indicating activation.
LOI counting, on the other hand, focuses on counting vehicles as they cross a specified line within the video frame. This virtual line acts as a threshold, counting each vehicle that
crosses it as it moves along the roadway. Figure 10 demonstrates an example of vehicle tracking by using a counting line covering multiple lanes (Li et al., 2020).
In addition to modern cameras with built-in microprocessors or cameras connected to processors installed in signal cabinets, volumes can be extracted from videos recorded by closed-circuit television (CCTV) systems. CCTV, known as video surveillance (Kumar and Svensson, 2015), has been widely used for security or monitoring purposes at signalized intersections. CCTV can provide continuous videos 24/7 throughout the year. Even though CCTV systems cannot provide volume data directly, image processing methods can extract real-time or offline traffic volumes from existing videos. For example, Figure 11 shows an image from a CCTV-based system, which identifies, tracks, and classifies vehicles (Fedorov et al., 2019) using region-based convolutional neural networks, which is a deep learning architecture for object detection.
Some sensors use a traffic detection module to detect vehicle presence but may require a separate traffic data collection module or a special subscription to gather data such as volumes, speeds, and density (Wu et al., 2021). The primary purpose of a traffic detection module is to help control traffic signals by monitoring when vehicles are waiting at an intersection or passing specific points on the roadway. This module works by detecting an object’s presence, which triggers the traffic signal controller to initiate appropriate signal phase changes (e.g., changing from red to green) or actuations (e.g., extending a green phase if vehicles are still detected in the queue). Traffic data collection modules extend the capabilities of a system by enabling more detailed measurements but often require specialized hardware, software, or a subscription to enable data collection features.
The working principle of automated video-based systems for counting non-motorized traffic is similar (Figure 12). Though some cameras can count only non-motorized traffic, others can differentiate between non-motorized and motorized traffic (Shah et al., 2020). In many cities, existing traffic monitoring cameras can be upgraded to count not only motor vehicles but also non-motorized traffic (e.g., Iteris’s SmartCycle). Some cameras are able to detect screen line and turning movement volumes at intersections, and others can also collect other data such as speed, travel direction, and traveler-specific characteristics (Ryus et al., 2014; Shah et al., 2020). Automated video-based systems can be used for both short- and long-term counting. This technology requires minimum human effort for counting non-motorized volumes. Figure 12 shows an example of a video-based system identifying pedestrians, while the system shown in Figure 13 detects both motorized and non-motorized traffic.
Table 3 summarizes the main strengths and weaknesses of video-based systems for counting motorized and non-motorized traffic.
Table 3. Strengths and Weaknesses of Video-Based Systems.
| Strengths | Weaknesses |
|---|---|
| Motorized and Non-Motorized Traffic | |
|
|
| Motorized Traffic Only | |
| No additional strengths and weaknesses beyond those applicable to both modes | |
| Non-Motorized Traffic Only | |
|
|
The validation results from NCHRP Project 03-144 revealed that the accuracy of motorized traffic volumes obtained from video-based systems varied significantly (WMAPE = 1.4% − 33.7%) by vendor, equipment model, and intersection. The mounting height, location, placement, and proper calibration of a camera are crucial to the optimal performance and accuracy of the outputs (Ishak et al., 2016). Video-based systems are affected by external factors that result in poor visibility and obstructed camera views. Many cameras tend to undercount vehicles, but there are cases where overcounting is observed.
The most common causes of undercounting are:
The most common causes of overcounting are:
The NCHRP 03-144 validation results showed that the accuracy of signal equipment for counting non-motorized traffic was lower than that for motorized traffic (WMAPE = 3.6% − 93.7%). Most video-based systems undercounted non-motorized users. This undercounting trend is observed in the scatterplot shown in Figure 24, which displays all non-motorized hourly volumes from the study intersections equipped with video-based systems.
The low accuracy and the undercounting issues can be attributed to several factors:
Despite these issues, some video-based technologies, primarily new products that use advanced AI technologies, have shown great potential in both differentiating and counting pedestrians and cyclists at intersections (Ozan et al., 2021). Video-based systems are capable of accurately counting the flow of bicycles for various movements having distinct origins and destinations, even in challenging environments (e.g., intersections) where mixed traffic is present (Zangenehpour et al., 2015).
Proper installation, thorough calibration for counting motorized and non-motorized traffic, and frequent maintenance of the equipment are keys to improving data accuracy. General recommended practices and ideal characteristics of video-based systems and data for traffic monitoring use are described below.