TRANSPORTATION RESEARCH BOARD OF THE NATIONAL ACADEMIES OF SCIENCES, ENGINEERING AND MEDICINE PRIVILEGED DOCUMENT
This document, not released for publication, is furnished only for review to members of or participants in the work of NCHRP. This document is to be regarded as fully privileged and the dissemination of the information included herein must be approved by NCHRP.
Abhijit Sarkar
Calvin Winkowski
Han Xu
Balachandar Guduri
Aditi Manke
Matthew Camden
Virginia Tech Transportation Institute
Blacksburg, VA
Permission to use an unoriginal material has been obtained from all copyright holders as needed
AI is a research field in computer science. As one of the pioneers in AI, McCarthy (2004) stated, “[AI] is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable” (p. 2).
Over the years, several ML algorithms have been developed to help machines to learn and often to achieve a perception and performance similar to a human. These methods are widely used in industry and in daily life to perform tasks that may take a human a longer time, or where a human may introduce subjective biases. Therefore, over the years, many ML methods have been proven effective in bringing efficiency and increasing accuracy. While ML is more of a research field in computer science, the fundamentals and domain of developments of ML overlaps with multiple other fields of study, including statistics, control systems engineering, information theory, computer programming, psychology, and mathematics. The application domains also expand to many interrelated fields of study such as computer vision, image processing, natural language processing (NLP), communication science, manufacturing, robotics, high-performance computing, data engineering, and transportation research, to mention a few. Therefore, although AI can often be referred to as a field of computer science, it cannot be studied or implemented without understanding the overlapping nature and dependencies with other fields of research (see the AI Literature Review Report deliverable for this project for more details). In this report, our primary focus is the modern development and tool sets related to AI that can benefit state and local DOTs.
Modern development of AI and ML has several components. Most of the modern ML models are based on data-driven approaches. In most of the algorithms developed currently, a machine learns by examples and their associated annotations, similar to a human. This is often referred as supervised learning. Machines can also learn in an unsupervised manner from data where specific annotations are not provided. In the last 30 years, several ML models and philosophies have been developed for addressing both supervised and unsupervised methods. These includes basic models like logistic regression, decision trees, k nearest neighbor, naïve Bayes, etc. More advanced methods like random forests, support vector machines (SVMs), probabilistic graphical models, and neural networks have been developed to address issues with generalization, performance, and speed of operations. However, most of these methods were based on intermediate features that are crafted manually. Also, these methods could not be scaled while learning from large-scale data. Hence, these algorithms were limited to a small domain of applications. However, an ideal ML algorithm should generalize across all domains of applications and should perform with the highest accuracy and speed across all variations. DNN-based methods largely address these issues and help with real-time inference by learning from large-scale data. Also, DNNs are efficient in generating a large number of features that are not manually crafted, hence eliminating a large portion of human bias. The modern revolution in AI was mainly fueled by the advent of DNNs and their variants (see Appendix A for more details). However, in recent years, we have observed that real-world problems vary by different data modalities, data volumes, and accuracy requirements. Therefore, a variety of ML methods may be employed depending on these variabilities. While DNN-based methods can be used to solve several problems and components of the problem, often a simple logistic regression method may provide adequate accuracy for the targeted application. Therefore, in this report we have not limited ourselves to the DNN-based methods, but we have explored all possible solutions that are part of data engineering, data processing, statistical analysis, and advanced analytics. We specifically concentrate on AI tool sets that facilitate data-driven (often big data) techniques and methods to reflect recent trends in AI research.
The overall AI ecosystem comprises multiple components as shown in Figure 26. While data reside at the center of the system, they are often augmented by input from humans either in a supervisory role or a supportive role as a facilitator. A human is often required for data annotations and validations. The other key components are the data management and the execution of advanced analytics methods such as ML tools. Any AI practice uses continuous improvement through advancements in data collection, development processes, domain expertise, and AI research. Data are collected through a data collection effort or through a public or private data set, then organized onto a data platform where humans can annotate or leverage existing AI tools to enhance AI annotations. Models are trained and tuned using domain expertise. Even minor domain-specific modifications can significantly improve the performance of algorithms for a specific domain. Trained models are validated against collected test data and other data collections on the data platform. Inferencing for unannotated data generates new results that can then be used to analyze collected data in new ways. Human annotation may be used to enrich the results at much lower cost than a completely human process. Finally, the new annotations are added into the data platform and combined with other AI-generated data and collected data to produce analysis for managing infrastructure, developing policy, making business decisions, or automating processes.
Insights from the analysis of the data combined with AI results can improve future data collection. Results from algorithm validation are especially useful to identify underrepresented scenarios in the data that can be improved through new collection protocols or specific collection efforts. As the domain knowledge advances, with help from AI, new insights can be brought into the algorithms to improve performance. Lastly, pure and applied AI research pushes the envelope of related tasks and creates new algorithm classes and techniques that can be adapted into the developed processes.
A data-driven AI project can have multiple connected and interdependent components. We can broadly categorize them in six major steps, as shown in Figure 27. A project starts with the business understanding
and goes through different analytics methods before it reaches final deployment. Each of these steps involve a number of key personnel from the institutional hierarchy, including managers, engineers, software developers, domain experts, and end users. Similarly, each of the steps may involve different sets of AI tools. We describe each of the steps briefly in this section.
The AI development lifecycle requires adequate tool sets and software to execute efficiently. In the last two decades, many tools have been developed under different platforms. The challenges of AI development are not limited to developing the algorithm; they are intimately tied to managing data and computing at scale. It is impossible to discuss one without the other, and ultimately this drives the discussion around
understanding value and feasibility. This section of the report summarizes the process of AI and the required tools to accomplish AI-related tasks. The main four directions for our review include:
Software is an important part of the AI development pipeline. In the overall development of any AI-based project, Step 2 through Step 5 (see Figure 27) are dependent on software development and implementations. In this section, we will mainly discuss some of the key aspects of AI and how different software platforms provide tools and application.
Current AI development depends on how we develop ML models using traditional methods like SVMs and random forest.
A Python library is a collection of codes or modules that implement a specific function or operation. It contains modules of exact syntax and semantics of Python language that provide access to system functions such as I/O and other core modules. Programmers can easily implement some operations by importing the functions from the libraries, and thus programming becomes easier. Here we introduce some commonly used libraries:
NumPy4: The name “NumPy” stands for “Numerical Python,” which is an open-source Python library used for data analysis and scientific computing. It provides a huge library of high-level mathematical functions to do operations on large matrices and multi-dimensional data efficiently. NumPy can be regarded as the basis of Python data computing because of the robust data analyses and ML frameworks that use NumPy at the bottom.
Pandas5: This is a data structure and analysis tool built on top of Python. The core of Pandas is an efficient and easy-to-use data type: DataFrame, which is suitable for manipulating data in the Python language environment. Under this data structure, we can clean, organize, summarize, merge, transform, and calculate the data.
SciPy6: SciPy is a software package that builds on NumPy for mathematics, science, and engineering and handles optimization, linear algebra, integration, interpolation, fitting, special functions, fast Fourier transforms, signal processing, image processing, ordinary differential equation solving device, etc.
Scikit-learn (or Sklearn)7: This library is designed to be used in conjunction with the Python numerical science libraries NumPy and SciPy. It is an ML toolbox based on the Python language. Sklearn encapsulates commonly used ML methods, such as classification, regression, clustering, dimensionality reduction, model evaluation, data preprocessing, etc. Programmers only need to call the corresponding interface to implement such operations.
MATLAB is a proprietary programming language-based computing environment that MathWorks developed. It provides a platform managed by an enterprise. It can be used by beginner to intermediate users of applications, and comes with targeted customer support. Unlike open-source platforms, MATLAB has version control and dependency of the functions across toolboxes. Simulink® and other GUI-based tools provide simulations, quick demonstration, and testing, including to novice programmers. MATLAB provides various toolboxes and functions targeted for specific applications. These functions and applications can work independently as well as with each other. MATLAB comprises several toolboxes
___________________
targeted for specific applications like image processing, statistics, ML, signal processing, etc. Each of these toolboxes has multiple functions. The parallel processing toolbox helps to run MATLAB on computing clusters.
The signal processing toolbox provides functions and applications to process time series or sequential data streams for trend analysis, spectral analysis, and feature engineering. Vehicular data like speed, acceleration, etc., are time series data. The signal processing toolbox helps to process these data, use filters to eliminate noise, and study critical characteristics of this data stream. Many AI-based methods use multimodal data, including time series data.
The Statistics and Machine Learning Toolbox provides functions and applications that can be used for data science applications, where a user can summarize, analyze, and model the multimodal data (Step 2 in Figure 27). The statistics part of the toolbox helps to create descriptive statistics, sample the data using probability distribution functions, conduct analyses of variance and covariance, and perform parametric and nonparametric hypothesis tests. Hypothesis tests such as distribution (e.g., Anderson-Darling and one-sample Kolmogorov-Smirnov), location (e.g., z-test and one-sample t-test), and dispersion (e.g., Chi-square variance) tests determine whether data come from a population with a specific distribution, mean, and variance, respectively. In the ML part, the toolbox provides supervised, semi-supervised, and unsupervised ML algorithms, including SVMs, neural networks, boosted decision trees, k-means, and other clustering methods. It provides various regression models to fit the data using supervised learning. Classification algorithms allow us to classify data and build models using supervised and semi-supervised learning algorithms to handle binary and multiclass problems. The clustering methods based on unsupervised learning techniques help to identify natural groupings and patterns in the data. These models can be built interactively using the Classification and Regression Learner apps or programmatically using AutoML. Furthermore, the toolbox provides intermediate steps such as principal component analysis (PCA), regularization, dimensionality reduction, and feature selection methods that allow the identification of significant variables with the best predictive power. Additionally, it contains a variety of functions that handle out-of-memory issues.
The Mapping Toolbox helps transform data from various sources into geographic coordinates and create map displays from more than 60 map projections. This toolbox allows a user to process and customize the data using trimming, interpolation, resampling, coordinate transformations, spatial resolution adjustment, and other techniques. It supports various file formats such as shapefile, GeoTIFF, KML, and VMAP0 for importing and exporting data.
The Control Systems Toolbox provides algorithms and apps for systematically analyzing, designing, and tuning linear time-invariant (LTI) control systems. It allows defining multi-inputs and multi-outputs (MIMO) and single-input and single-output (SISO) systems as a transfer function, state-space, zero-pole-gain, or frequency response model. The Linear System Analyzer app allows the investigation of the system’s behavior, which can be analyzed and visualized using step response and Bode plot in the time and frequency domains. The Bode loop shaping and the root locus method help to tune the system’s compensator parameters interactively to achieve a specific open-loop response. The toolbox allows automatic tuning of SISO and MIMO compensators in a closed feedback loop. Furthermore, it allows tuning by specifying multiple tuning objectives, such as reference tracking, disturbance rejection, and stability margins. Additionally, this allows validating system design by verifying LTI model characteristics such as rise time, overshoot, settling time, gain and phase margins, etc.
The Optimization Toolbox provides functions to solve various optimization problems for finding parameters that minimize or maximize objectives while satisfying constraints. It includes solvers for linear programming (LP), mixed-integer linear programming (MILP), quadratic programming (QP), second-order cone programming (SOCP), nonlinear programming, constrained linear least squares, nonlinear least squares, and nonlinear equations. It allows users to formulate the continuous or discrete optimization problems using functions and matrices or variables and expressions and solving these in serial or parallel order. The toolbox contains functions that perform design optimization tasks, including parameter
estimation, component selection, and parameter tuning. The toolbox has automatic differentiation of objective and constraint functions that helps to find optimal solutions faster and more accurately.
R is a programming language mainly developed for statistical data analysis. R constitutes several associated packages that help in exploratory data analysis and ML-based tasks. Some of the most popular and widely used packages are caret, nnet, E1071, tidyr, mlr3, and xgboosts. The caret package is used for classification and regression-based training and evaluations. E1071 is used for different ML tool sets including clustering methods, SVMs, naïve Bayes, and Fourier transform, among others. The nnet package is used for simulation and testing with neural network-based models. The tidyr is a package that is used for data processing, wrangling, and basic analysis. The mlr3 and xgboosts packages provide tools and functions for several ML and gradient boosting-based algorithms.
Deep learning-based models demand a specific types of software platforms as the overall architecture are different from any of the previously developed algorithm. The extensive use of convolution, requirements of transferability, requirements for flexible design by developers prompted many researchers and enterprises to create new and novel development environments. Therefore, starting from the advent of DNNs, a number of new programming frameworks became popular like Caffe, Torch, TensorFlow, Keras. Each one of them have been successful in their own terms and have been a reliable platform for training and testing of DNN models. With the popularity of python as programming language many of these systems were adapted in Python-based framework. Currently the two of the most popular frameworks, TensorFlow, and PyTorch, use python as its programming environment. In this section we discuss some of these frameworks.
PyTorch is a deep learning library in academia and industry. It is developed by Facebook AI Research Group and is a Python implementation of the Torch library. As a port of the classic ML library, PyTorch provides Python language users a choice for writing code. PyTorch functionality includes the following:
___________________
TensorFlow is a software framework to implement ML algorithms and develop models designed by the Google team. It combines computational algebra with optimization techniques to facilitate the computation of many mathematical expressions. It contains a concept called tensors, which are used to create multidimensional arrays and optimize and evaluate mathematical expressions. This library includes programming support for DNNs and ML techniques and highly scalable computing capabilities with various data sets. TensorFlow uses GPU computing and automated management. It also includes features that optimize the same memory and usage data. TensorFlow’s functionality includes the following:
MXNet is an open-source deep learning framework used to train and deploy DNNs. It is extensible, allows fast model training, and supports multiple languages. The framework has dataflow graphs like Theano and TensorFlow and is well configured for multi-GPU configurations. It has high-level model building blocks like Lasagne and Blocks and runs on any hardware imaginable (including mobile phones). MXNet’s functionality includes the following (Chen et al., 2015):
Caffe, the full name of Convolutional Architecture for Fast Feature Embedding, is a deep learning framework with expressiveness, speed, and modular thinking (Jia et al., 2014). Although its kernel is written in C++, Caffe has Python and MATLAB-related interfaces. Caffe supports many types of deep learning architectures for image classification and image segmentation, and it supports the design of CNNs, region-based CNNs (R-CNNs), LSTM networks, and fully connected neural networks (FCNs). Caffe supports
___________________
accelerated computing kernel libraries based on GPUs and CPUs, such as NVIDIA cuDNN and Intel MKL. Caffe’s functionality includes the following:
The deep learning toolbox by MATLAB contains various tutorials and examples from data to deployment for users with coding experience from beginners to advanced engineers to create and modify DNNs with ease. The toolbox provides various materials to teach users how to train the models from scratch. Also, various pre-trained deep learning models are available to explore and download. With this toolbox, it is easy to practice coding for deep learning applications such as image recognition, visual inspection, LiDAR, automated optical inspection, classifying images from a webcam, visualizing network features, classifying data using LSTM networks, etc., using step-by-step video tutorials. This toolbox incorporated specialized functionality for various applications such as computer vision, reinforcement learning, signal processing, radar, LiDAR, and wireless. Deep learning for computer vision contains several examples and tutorials for semantic segmentation, object detection, and image recognition. Some are image and video labeling, image datastore, image and compute vision-specific processing techniques, and the ability to import deep learning models from TensorFlow-Keras and PyTorch for image recognition. Using MATLAB, Simulink, and the Reinforcement Learning Toolbox™, users can explore and practice complete workflow with examples for simple control systems, autonomous systems, robotics, and scheduling problems. This toolbox provides opportunities to develop predictive models to solve various signal-processing applications. It enables the application of AI techniques to radar and LiDAR applications. To leverage model architectures developed by various community members, this toolbox contains simple models (e.g., AlexNet, VGG-16, VGG-19, GoogleNet), higher accuracy models (e.g., ResNet-50, Inception-v3, Densenet-201, Xception), models for edge deployment (e.g., SqueezeNet, MobileNet-v2, ShuffLeNet, EfficientNet-b0), and models from other frameworks. Developers can seek help, the latest insights, and information from the deep learning community and experts. MATLAB provides a 30-day free trial for exploring this toolbox.
Recent AI and ML development is largely owed to large-scale open-source development platforms and source codes. Many researchers have open sourced their code base for non-commercial purposes, which has helped the research community immensely in terms of reproducibility of research and developing new algorithms on top of existing methods. This is one of the major reasons for the large growth of the AI community. In this section, we provide examples of such open-source code bases that can be used for different tasks.
OpenMMLab is an open-source algorithm platform of MMLab, a joint laboratory of the Chinese University of Hong Kong and SenseTime. OpenMMLab is a computer vision algorithm system and framework involving more than 20 research directions, more than 300 algorithms, and more than 2,300 pretraining models. After years of development, OpenMMLab has gradually formed a complete system and organizational structure, which can provide open basic technical support, interface standards, and algorithm frameworks. These open resources have been actively used and contributed by more and more AI researchers and have had an important impact on the development of the AI community. Most of OpenMMLab’s libraries are based on the deep learning PyTorch framework. The algorithms are at the forefront of current technologies and have much documentation. OpenMMLab is mainly targeted for computer vision-based work and covers object detection, semantic segmentation, action recognition, LiDAR data processing, optical character recognition, body pose estimation, etc. All these algorithms are under the same ecosystem, which helps researchers with development and testing.
Detectron2 is Facebook AI Research’s next-generation open-source object detection system, a complete rewrite of the previous version of Detectron. With this framework, users can train various state-of-the-art models for detection tasks such as bounding box detection, instance and semantic segmentation, and panoptic segmentation. There are many pre-trained models in the framework that can “plug and play.” The design based on PyTorch can provide a more intuitive imperative programming model, so developers can iterate model design and experiment more quickly.
Starting with Detectron2, Facebook introduced a custom design that allows users to insert customizations into almost any part of the object detection system. Its scalability also makes Detectron2 more flexible. Detectron2 was rewritten from the ground up to resolve several implementation issues in the original Detectron, making it faster than the original Detectron.
OpenCV (Open-Source Computer Vision Library) is an open-source computer vision library whose main algorithms involve image processing, computer vision, and ML-related methods. Many commonly used computer vision algorithms are implemented in its source code files. OpenCV can be used to develop real-time image processing, computer vision, and pattern recognition programs.
OpenCV consists of a series of C functions and C++ classes, and it has C, C++, Python, and Java interfaces. The current software development kit (SDK) already supports application development in languages such as C++, Java, and Python. Currently, the newly developed algorithms and module interfaces of OpenCV are based on C++. It covers computer vision applications such as industrial product inspection, medical imaging, drone flight, unmanned driving, security, satellite map and electronic map stitching, information security, user interface, camera calibration, stereo vision, and robotics.
OpenCV was originally initiated and developed by Intel Corporation, licensed under the BSD license, and can be used for free in commercial and research fields. Currently, the American robotics company Willow Garage provides the main support for OpenCV.
___________________
SAS is a command-driven statistical software used for data analytics and visualization. It can perform advanced analysis, business intelligence, predictive analysis, and data management. The statistical software provides more than 90 prewritten procedures for data analysis. The tools provided to meet some of the statistical needs include analysis of variance, regressions, Bayesian inference, and model selection for large data sets.
SAS also supports AI technologies such as ML, computer vision, NLP, forecasting, and optimization. For example, the visual data mining and ML in SAS help identify common variables across multiple models and important variables that are selected across all models, and they help in the assessment of results for all models.
R is an open-source programming platform that is freely available. One of the strong points of using R is its data visualization tool. Packages that are available for visualization include ggplot2, Lattice, ggvis, googleVis, rcharts, and many others. R allows users to build two-dimensional graphs as well as three-dimensional models. It also supports various data structures, operators, and parameters (e.g., arrays, matrices, and loops) and can be easily integrated with other programming languages like C, C++, and Fortran.
R allows users to perform multiple ML operations like classification, regressions, data cleaning, or data wrangling. Packages that can be used to implement ML in R include: DataExplorer, Dalex (Descriptive Machine Learning Explanations), Kernlab, rpart, etc.
R is a comprehensive tool for pure statistical analysis. However, Python-based methods are used to build a complex analysis pipeline that requires multiple components such as statistics, image processing, control, etc. There are several packages used for statical analysis-based methods. Numpy, Scipy, Pandas, and Statsmodel are some popular ones used for statistical analysis. Pandas mostly helps in data preparation and framework creation. Scipy has capabilities of all statistical processes, including hypothesis testing, t-test, etc. Numpy and Statsmodel help in developing linear models and analysis of variance types of operations. Overall, Python is a comprehensive package that can be used as an add-on for statistical analysis.
Initial development of AI models is an iterative process that requires reevaluation after changes are made to optimize the model for the constraints of the specific problem, but most often the focus is on accuracy and speed. While objectives like reproducibility, running on edge computing hardware, and code sharing are considerations during the development phase, they are often solved in the deployment phase.
Visualization is a very important part of the AI process. This is particularly relevant for data understanding, model evaluation, and performance measures. Visualization can play a role in all sections of an AI pipeline. Data visualization is widely used to study and understand the quality of the data even before any ML processes begin. Data visualization of statistical measures, clustering of data, and pattern analysis of data and features are helpful. The most prominent use of data visualization is in the analysis and summarization of results. Several programming environments and solution systems provide tools for
pattern analysis and data quality assessment. Solutions like Power BI and Tableau provide interactive data visualization tools that can be standalone, web-based, and collaborative. Figure 28 shows an example of a dashboard that summarizes multimodal data in different attribute-based analyses, including map data. This integration and visualization provide an understanding of the solution across all levels of user (developer, user, management). In this section, we provide brief descriptions of some of these tools.
Data annotation is a process of labeling data such as text, image, video, or time series data so that ML systems can understand the data, learn to recognize them, and predict them. The annotation process allows ML to evolve quickly and produce useful research data points. Several data annotation tools are available in the market; some are commercially available for lease and purchase, and others are open-source and freeware. The critical elements related to annotation tools are listed in Figure 29.
The critical features to consider for choosing the right annotation tool are as follows:
Figure 30 provides a list of commercially available data annotation tools that are a good choice if the organization is at the growth or enterprise level.
The open-source data annotation tools provide the freedom to modify the source code and tailor it to specific needs. These tools can provide more control over features and integration at the expense of development and operating costs. On the other hand, these tools may have a poor workflow, work management, and security. For AV applications, open-source tools allow tuning annotation models to improve accuracy by
customizing the accuracy thresholds and security features. Figure 31 provides a snapshot of the available open-source data annotation tools.
A current trend in processing data for AI learning is the use of crowdsourced data annotation services. Popular services include Amazon Mechanical Turk, Google Cloud AI Platform Data Labeling, and Scale Data Labeling services. These platforms offer researchers the opportunity to engage an on-demand workforce to apply labels and annotations to data sets.
Benefits of crowdsourced data annotation include workforce availability, efficiency in handling micro-tasks, and, in some cases, personnel cost reduction. Negatives of using crowdsourcing for these tasks center around quality control: using a workforce outside of the researcher’s supervision may limit the ability to achieve results that are consistent and repeatable.
An alternative to crowdsourced data annotation is the use of in-house annotation tools. Computer Vision Annotation Tool (CVAT) is free, open-source software developed by Intel. CVAT is distributed under the MIT license and is available through GitHub. The software excels at object detection, image classification, and image segmentation. While there is a web version of CVAT, the locally installed version is needed to process large data sets efficiently.
CVAT is free, in continuous community development, and available for third-party integrations; however, open-source software does not have dedicated customer support and could have performance issues as the source code is revised. There are no official task workflows; rather, users depend on community-created tutorials. The abilities of CVAT continue to grow, but the current state of the software does not have some of the advanced filtering and sorting options found in commercial software packages.
In all data annotation scenarios, training for data quality is paramount. ML predictions rely on accurate labels being supplied. The greater the quality of data annotation, the greater the ability of AI to evolve and perform at maximum precision.
SuperAnnotate is another image annotation tool that streamlines and automates computer vision workflows for end-to-end image and video annotation. This platform creates larger volumes of high-quality training data sets using tool sets like vector annotations (boxes, polygons, lines, ellipses, key points, and cuboids) and pixel-accurate annotations using a brush. Tools available in the Vector editor annotate the image and videos with high accuracy. Pixel-accurate annotations divide the image into multiple segments in no time in semantic or instance mode, depending on the project requirements. This platform offers various solutions apart from image annotation, such as video annotation, text annotations, project management, data curation, quality management, SDK integration, and annotation services marketplace. The annotation marketplace offers various annotations service support to projects of all sizes across various industries, from AVs and medical imaging to security and surveillance. One of the features of this platform is that it offers an AI solutions team to oversee the annotation pipeline to smooth project delivery. It also supports various file formats through image conversion. This platform aims to provides high security and complies with various regulations: General Data Protection Regulation (GDPR), SOC 2, HIPAA, PCI DSS, International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 27001, ISO/IEC 27017, ISO/IEC 27018, and ISO 9001. It also offers a free 14-day trial to explore various products and services. Pro and enterprise plans offer the product demo through their website by request.
Supervisely is a web-based platform running through interactive apps powered by Python to label images, videos, LiDAR three-dimensional sensor fusion, and Digital Imaging and Communications in Medicine (DICOM) volumes for medical multi-slice scans. The image labeling suite contains various usual and “smart” labeling tools based on a collection of class-agnostic neural networks for interactive AI-assisted labeling. It also offers a data transformation language tool and enables three-dimensional point cloud visualizations. Various companies and researchers use this platform for annotation and managing data sets, data, user collaboration and training neural networks, unlimited customization and variability through apps and custom extensions with Python and VueJs, and enterprise-grade solutions with high security and privacy through HIPAA compliance. Limited image, video, and three-dimensional sensor fusion labeling per day is free for the community plan, and a 30-days free trial plan is available for the enterprise plan.
LabelMe is a free and open online annotation tool for digital image labeling created by the MIT Computer Science and Artificial Intelligence Laboratory. It supports six annotation types: polygon, rectangle, circle, line, point, and line strip. This platform only supports JSON format for importing and exporting the data sets.
___________________
Scale AI is a data platform that annotates large volumes of the three-dimensional sensor, image, and video data using ML-powered pre-labeling and an automated quality assurance system for the most safety-critical applications. This platform offers data set management, document processing, and AI-assisted data annotations targeted toward data processing for autonomous driving. This platform provides solutions for retail and e-commerce, defense and logistics industries, and various applications such as AVs, robotics, AR/VR, document processing, and content and language processing. It can be used for various computer vision tasks, including object detection and tracking, image classification, text recognition, prediction and planning, audio transcription and categorization, multi-source (combined satellite and other sensor data) mapping, and map updates.
V7 is an automated annotation platform combining data set management, model management, image annotation, video annotation, various labeling services, document processing, and autoML model training. This platform enables images, video, DICOM medical data, microscopy images, PDFs, and three-dimensional volumetric data. The key feature is that auto-annotation speeds up the annotation much faster using AI models without prior training. It is free for educational purposes and offers a pricing quote upon request for startups, business, and pro versions.
Many of the AI-based tools and services described in the previous section have been developed independently and with a specific focus. Because of this, some applications face compatibility problems and need additional coordination between the platforms. For example, CVAT is an image annotations platform, while PyTorch is a Python-based deep learning development framework. Output from CVAT can be used to develop computer vision-based models in PyTorch, but it needs additional coordination for these tools to communicate. Some third-party solution providers are specifically working in this area to create this coordination and offer an end-to-end solution for development, testing, and evaluation.
___________________
15 http://labelme.csail.mit.edu/Release3.0/https://scale.com/
___________________
19 https://azure.microsoft.com/en-us/solutions/ai/
___________________
22 https://h2o.ai/platform/ai-cloud/
24 https://www.intel.com/content/www/us/en/artificial-intelligence/overview.html
___________________
The black box nature of deep learning has introduced many new types of structured and unstructured data into AI algorithm development. Deep learning’s success at learning on data with complex internal relationships and the model’s relative opaqueness means many data modalities store far more information than a human would need to perform the same task. A human can easily recognize a face in a grainy low-light highly compressed image, but deep learning models struggle with noise in the image. In Figure 32, the same model was run on two images. To a human, both images clearly present a face, but the right image is more compressed, and the algorithm fails to identify the face in it. When managing data for AI training and inferencing, it is important to consider tradeoffs in the storage format, the storage system, and the conditions of collection.
Complete data pipelines will rarely rely on only a single type of data. Most will touch several, but the AI component may only deal with one. An important tradeoff is deciding how much data to provide the model and when to use a first principles approach to data fusion. It is possible to feed map and image data into a graph CNN to make predictions about driving conditions; however, most map-based analysis has clear algorithms based on the data, and performance with a non-AI method will be better.
There are many sources of data that state and local transportation agencies may use to create AI data sets. Some of these sources of data may reside within the local, state, or federal data collection systems to improve the efficiency and safety of the U.S. roadway network. However, there are other data providers that are owned by private companies. Below are a sample of data providers transportation agencies might leverage in AI programs.
The most recognizable vision data are images and videos, but this category also includes other forms of vision data like LiDAR and multimodal cameras. The primary concerns with these data are compression, preprocessing, and visualization. These are all potentially very large data sets, with prominent data sets exceeding hundreds of terabytes in size. It is impossible to store large quantities uncompressed and, in the case of LiDAR data, even process them uncompressed.
Compression algorithms are not optimized for computer vision tasks, but for human perception. They remove high-frequency elements of the images and preserve color differences that humans perceive well. Because of the opaqueness of AI models, it is difficult to know what compression optimized for AI models would look like. For LiDAR, where machine tasks are more common than human tasks, state-of-the-art compression performs computer vision tasks to identify things like the ground plane and objects in order to use different compression algorithms for each. In general, the best strategy is to preserve as much of the image as possible, well beyond what a human would need to perform the task.
Compression requirements make edge computing reproducibility difficult. On-vehicle algorithms have access to raw video and LiDAR data, but it is impossible to store that uncompressed to reproduce results from the vehicle post hoc. In such a safety critical environment, reproducibility is key, and this area of research is still rapidly developing. In a similar vein, visualizing the results of computer vision tasks means storing annotated images or videos. For video or image data, that doubles the storage requirement. For LiDAR data, video visualization is still large, though it is smaller than the original data. Interactive visualizations are popular for LiDAR because of the three-dimensional nature, but it requires significant computing time to render them.
Preprocessing vision data is best done on the fly, as it is computationally much cheaper than the model processing but can be difficult to tune correctly. Many models require different normalization conditions or perform better in a different color space than where the data are stored. Noise reduction and luminosity
changes can also help improve the performance of a model. On the training side, processes like random cropping and flipping can increase the diversity of the training set and improve generalizability.
Text data sources are highly variable and include optical character recognition forms, social media data, content submitted via online forms, and published media. Often meaning is highly context dependent; a newspaper article describing a bad experience will be expressed using very different language than an angry tweet. Because of that, metadata are some of the most important parts of working with text data. Text data sets are relatively small, but it is common for these data sets to be multimodal, or at least easily connected to other data modes like audio data. Scanned forms have images, online posts have an important time dimension, and reviews or complaints have important geographic information. In transportation a variety of systems use text data. This includes document management, document summarization (e.g., summary of police accident report), developing recommendation systems, chatbot systems for customer supports, and automatic transcription of key records to mention a few.
Geographical information system (GIS) and map data have a huge ecosystem of data sources and tools, and the data management problems largely exist around sourcing and accessing high-quality data. APIs like HERE and Google Maps Platform provide rich, accurate, and up-to-date data on demand, extending beyond traditional geospatial metadata to include images and reviews of points of interest based on location. A variety of tools, both open source and commercial, help transform, manage, and store geospatial data, while interactive and visualization work is often done with tools like ArcGIS and QGIS, and large-scale programmatic processing during AI development and deployment may leverage functions in spatially enabled databases, like PostGIS. As the transportation system is a geospatial system, many transportation problems benefits from the geospatial information of systems and dynamics. This may include the traffic patterns, traffic congestion, asset localization etc.
Time series data management depends significantly on how the data are being utilized. Many applications build AI models with the intention of deploying them on live data, receiving a stream of time series data, and making high-speed decisions based on those results. These models use stream processing solutions like Apache Flink. Processing done post hoc is typically iterative, and there is an expansive software ecosystem around time series processing and data storage. Ultimately, most time series analyses start with a CSV file, but as the data sets grow more quickly, storage mediums like Parquet or Arrow are needed. Massive data sets or those accessed by many people outside of the hosting organization may leverage databases or APIs to provide data. Ideally, any data sampled over time in transportation research can be reflected as time series data. Therefore, many of the data collected, accessed, and sampled by transportation agencies fall under this category.
Data are often thought of as collections of data points or representations of measurements, but in the AI space where models are complex, difficult to create, and can be enhanced independently of any code base that leverages them, the models are another data point. This is especially poignant when it comes to data sharing. Practices vary widely among AI developers on sharing models and code. Many in research freely share code and models, where others, especially in sensitive fields like deep fakes, may share only the code. Industry is much less likely to share models or even details about the performance of the model compared
to the state of the art. AI-as-a-service providers rarely share details about how the models were created or their performance characteristics.
This limited sharing leads to problems with reproducibility. Validating models or algorithms without access to trained model parameters is very difficult. Some models train using controlled unclassified information, personally identifiable information, or proprietary data and can share the model, but not the training data.
AI competitions are popular among the research community and publish large training data sets. These are standard starting points for building algorithms and can be augmented with additional sensitive data to get application specific.
Recent methods in AI excel at making predictions on data that defy first principles analysis. These are large data with complex internal relationships. AI algorithms are opaque, and when developing and storing data sets it is best to provide as much clean data as possible, relying on the learning in the algorithm to distinguish between useful and irrelevant information. Furthermore, the AI training itself requires large data sets to learn relationships in the first place. The propensity toward large data sets and complete data points means storage requirements are significant.
Large-scale data collection takes significant planning, coordination, and continuous validation. Sufficient storage is required prior to data collection, so being able to predict data collection sizes and the resources required to ingest the data is essential to a successful effort. Accurate prediction requires expertise, experience, and familiarity with the objectives of the effort to make trade-off decisions. For large efforts, recollection is impossible, so it is best to start with a smaller pilot collection or build in regular validation points to the collection plan.
Large, structured data sources require an extract, transform, and load (ETL) process. This processes the collected data from the original data sources and extracts them for data cleaning and restructuring. This transformation step is critical to data cleaning, as this is where most of the critical metadata to make data quality determinations exist. AI algorithm development is so dependent on the data that this step can often determine success or failure of an effort. The data are then loaded into a data system like a database or standardized file structure for annotation and processing. The size of the data set has little effect on the complexity of the ETL process as it is almost entirely dependent on the data quality and the complexity of the ingested data.
The best storage solution will cater to the data’s size, the project’s performance requirements, and the algorithm’s access patterns. File and object are the standard mechanisms for storing structured or unstructured data. Directories full of CSV files or images organized by class or date work well for small-scale processes with less than 50,000 files. For a large number of files, object storage like Amazon Simple Storage Service (S3) will outperform file storage. Cost on data storage is driven by the total size and the desired performance. Using flash storage or high-performance cloud storage classes may improve the processing speed, but they will have no impact if computational resources are the reason for any delays. Piloting both the training and inferencing components of any AI pipeline is critical for guidance on choosing
the correct storage system. This can be done with naïve approaches first to understand the workload pattern and cost-benefit tradeoffs.
Some data, especially telemetry and GIS, benefit from indexing the data in a database. Tools to do this include TimescaleDB, PostGIS, tile servers, and traditional relational database management systems (RDMS). The advantage of using a structured data store like a database is greatly increasing data access speeds without requiring faster performing storage. These excel when the data store can perform some calculation to return less data than the entire data set. PostGIS, for example, can be queried to return all road segments with a certain curvature range in a set of cities much faster than checking the data with an algorithm. When working with metadata-rich data sets, databases provide the greatest value.
Metadata are critical to creating AI data sets and validating models. They include information from data collection details to summary information about the contents. Sometimes the metadata are part of the data organization. For example, ImageNet stores image files in numbered directories, each of which corresponds to the class of object in the image. Time series data may store all the metadata along with the data, such as a row in a table containing columns for a temperature sensor, sensor errors, location, and time. Sophisticated or highly complex data sets may rely on one or more databases to track all the data, information, and results. Object storage systems like S3 can tag stored objects with metadata and later query objects by those metadata tags. This integrates the external metadata store with the data itself.
Performance requirements of large storage are entirely dependent on the application. The processing of large video data sets will often be bottlenecked by the available GPU resources, unless leveraging cloud resources to scale very widely. Data that have relatively low compute costs, like time series data, will strain the limits of all but the fastest storage. Data access will become a constraining factor of any system that exceeds a few terabytes as specialized systems are needed to store data of that size. Cloud services offer several significant advantages in ease of use and manageability but are more costly over time than dedicated storage systems.
Managing large data sets at scale means certain operations are difficult or impractical and require a specific, intentional data management plan. Transferring hundreds of terabytes over the internet to a new site is all but impossible without costly dedicated high-speed links into a major internet exchange or one of the research networks like Internet2 or ESNet. Changing file permissions on more than a million files can take days or weeks. These need to be planned and funded for the lifetime of the data set. This requires a very forward-thinking view to support the data use.
Computing resources are important to consider when deciding what tasks are required, the speed and efficiency needed for those tasks, and the total cost of operation. CPUs are the core of every laptop/desktop computer. Designed to perform a wide variety of tasks, the usefulness of these chips is limited in the field of ML and AI. CPUs are suitable for some annotation and data inference tasks and can have their utility extended through parallel processing and clustering.
GPUs are specialized components that excel in advanced mathematical computations. GPUs, originally used for gaming systems, have evolved to become a cornerstone of ML. Unlike CPUs, which have steadily but slowly improved over time, GPU performance rapidly improves with each generation. These performance gains make new AI algorithms with more parameters viable and shorten the lifetime of GPUs. Depending on the GPU and when it was purchased in the product release cycle, they may not last more than 5 years before they are functionally obsolete. To extend beyond the limits of what a single machine can process, computing clusters orchestrate processing across a large set of systems. These require a scheduler to manage sharing of the cluster and automate starting and stopping processing. To have rapid, easily scalable computing services, several companies now offer cloud-based resources. Some companies include NVIDIA, AWS, Intel.ai, and Oracle.
Edge computing, sometimes called IoT, moves computation away from central resources onto the devices doing data collection. This allows massive data reduction by transferring and storing only the results of AI algorithms. Sometimes this is done as a multi-stage pipeline where an algorithm is used to identify data of interest, or it reduces the data only partially, depending on a central resource to complete processing. It can solve many of the problems of large-scale data collection, but errors in the edge computing algorithm mean lost data or bad results, with no option of correction.
Cloud computing converts capital expenditures into operational expenditures. It amortizes the cost of maintaining large-scale systems relying on economies of scale to maximize the efficiency of their operation. To potential customers, this means large compute and storage resources are readily accessible with virtually no startup cost. In practice, fully utilized on-premises systems with existing support infrastructure become more cost effective in a matter of months. Still, for small use cases, during the startup phase of a major project, and for “bursty” systems whose maximum performance requirements far exceed their means, cloud computing is a smart tactical option.
Several software tools are currently available to researchers. Each of them provides different sets of benefits to its users. The academic AI community has been very open about its work, including final software tools available through open-source platforms like GitHub for anyone with proper domain knowledge to use. In this section, we discuss the benefit of each such system and the sets of tools we described in the previous sections.
In this section, we discuss tool sets that provide platforms either for end-to-end solutions or to help a user customize their solutions. Many of these systems are cloud-based solutions. We have compared them in terms of their required cost, usability, and overall security. As discussed in Section 0, data security is an important component of the overall system. There are many protocols on data security. Depending on the sensitivity of the data, stakeholders should choose the tool sets that are most appropriate. A comparative analysis is shown in Table 17.
Table 17. A cost-benefit analysis of different AI-based tool sets available for use.
| Tool | Cost | Free trial | Security | Resources |
|---|---|---|---|---|
| Superb-AI | High | Available |
|
Data sets, Documents, Version History, Datacast, White papers, Blog Academy and Community support |
| AWS | High | Available |
|
Vast database of guides, API references, tutorials, projects, SDKs, and toolkits are available Training and certification Video tutorial for developers Amazon Builders’ Library |
| Azure AI | Moderate | Available |
|
Offers various digital media, training and certifications, analyst reports, white papers, e-books, videos Developer resources, documentation, and Quick start templates Provides community supports to develops and students |
| Oracle AI | Moderate | Available |
|
Documentation, cloud services, training, and |
| Tool | Cost | Free trial | Security | Resources |
|---|---|---|---|---|
|
GitHub for developers and students Resources in Python, JS, Java, .NET, php, Terraform, Apex, SDKs, GraalVM Various solutions for enterprise apps, ML/AI, web apps, containers, analytics, serverless, computing, etc. |
|||
| Vertex AI | Low | Available |
|
Available support from Google cloud and community Available various code samples Detailed console, libraries, and guide and references available |
| H2O AI | Open source | Available |
|
Data sets, Documents, Version History, Datacast, White papers, Blogs Various open-source algorithms Enterprise support |
| IBM Watson | Low | Available |
|
Cloud environment for public, private, or hybrid cloud Developer tools and documentation, technical paper, and digital media SDKs on GitHub Watson Assistant AI consulting services |
| Tool | Cost | Free trial | Security | Resources |
|---|---|---|---|---|
|
||||
| Intel AI | Not listed | Not listed |
|
Various ready-to-deploy AI software, hardware, and solutions. Digital media, technology sandbox, training portal for registered members |
| NVIDIA DRIVE | Not listed | Not listed |
|
Offers Deep Learning Institute (DLI) courses, drive videos, webinars, GPU Technology Conferences, Drive documentation NVIDIA Drive download available for members for the Developer Program |
| Autoware | Open source | - | No information is given |
|
Note. Some of these are commercial systems.
In this section, we discuss the cost and benefit analysis of big data management and explore two options: if the DOTs want to collect and store data on their own premises or use cloud-based storage facilities. Storage cost has been significantly lower in recent years. In the meantime, the demand for storage requirements has increased mainly due to the introduction of high-quality sensors like high-definition cameras and LiDAR. In many cases, the data can be stored in a local facility through a large-scale storage system. The main advantage of such a system is that the collected data are readily available to the researchers over the local area network. Also, for the physical storage facility, the security of data is higher, given that proper security protocol is taken. However, these kinds of systems need experts in the IT and database management systems who can continually monitor the health and access of the system. On the other hand, a cloud-based service does not need experts for maintenance. The workload of an IT
professional is comparatively lower. However, the team needs to upload all the data to the cloud, which may need very high-speed internet service. If the data volume is too large, then the internet speed may hinder the process. Also, these systems may be costly in terms of keeping the data for a long period of time. However, if the team decides to use cloud-based computing resources, then this kind of cloud-based system may be easy to integrate. The comparative results are shown in Table 18.
Table 18. Cost-benefit analysis for big data management.
| Tool | Infrastructure/maintenance ↓ | Recurring Cost ↓ | Security ↑ | Knowledge for data management ↓ | Speed ↑ |
|---|---|---|---|---|---|
| Physical Storage | High | low | high | High | high |
| Cloud-based storage | low | Very high | low | low | low |
Note. The ↓ signifies lower value is better. The ↑ means higher value is better
In this section, we discuss options related to high-performance computing. Most recent AI-based solutions depend on GPU-based architecture. Although some of the algorithms can still run on CPU-based systems, their inference speed may also be limited by the CPU environment. Nevertheless, as in the last section, we mainly concentrate on on-premises computation versus cloud-based computation. GPUs are costly to buy, and processing large-scale data (in the range of terabytes) requires a very large-scale GPU computing resource. This may significantly impact the initial infrastructure cost. Also, this system needs dedicated IT personnel for managing, scheduling, and maintaining these GPUs. Finally, we have seen historically that, with accelerated research in high-performance computing, the GPU half-life is becoming shorter as new high-performing GPUs are available on the market. Despite all these limitations, having on-premises computational resources significantly reduces the recurring computational cost. It also reduces costs related to cloud-based storage and upload of large-scale data. The cloud-based systems can provide very large-scale computing resources, but their recurring cost and data storage costs are high. Therefore, the decision should be dependent on the specific application and GPU/CPU hours needed for the task. A comparative analysis is shown in Table 19.
Table 19. Cost-benefit analysis of computing resources for AI applications.
| Tools | Infrastructure/maintenance ↓ | Recurring cost ↓ | Security ↑ | Personnel knowledge base ↓ | Speed ↑ |
|---|---|---|---|---|---|
| On-premises CPU | medium | low | high | high | low |
| On-premises GPU | high | low | high | high | high |
| Cloud-based CPU | low | high | medium | low | medium |
| Cloud-based GPU | low | high | medium | low | high |
Note. The ↓ signifies lower value is better. The ↑ means higher value is better.
AI solutions can be applied in multiple ways. Some can be applied by outsourcing tasks to third-party specialized vendors (camera-based solutions). Others can be done through crowdsourcing (traffic data collection). A part of the task can also be done autonomously. Each option has pros and cons, as shown in Sections 0 to 0; in most cases, when someone needs to perform a task autonomously, they need a proper infrastructure and personnel who can operate and manage the system. In AI-based systems, often these are people who have a background in computer science, software engineering, or information technology. Also, they must be aware of any security threats for the system. However, once these requirements are met, the
recurring costs of the systems become lower. On the other hand, many of the cloud-based services offer ready-to-go platforms that require very low maintenance and may only need personnel with domain knowledge in AI. The main drawback in such systems is the recurring cost of the services, as they are often pay-per-use. While these two examples need experts in AI on the team in some capacity, I other possible solution is outsourcing tasks to a third-party vendor. Many enterprises are now developing capabilities to provide end-to-end solutions that include data collection, annotations, and analysis. In such scenarios, the DOTs need to acquire domain knowledge to understand the various components of the tasks and the main evaluation criteria in order to manage the project. We believe that a DOT should take a systematic approach to decide which option is better for the given problem, cost budget, and time budget.
AI stands to benefit a variety of transportation-related applications for DOTs. The scope of AI is enormous and includes road transport, safety, infrastructure management, and energy efficiency. AI has the potential to make traffic more efficient, ease traffic congestion, free up time spent driving, make parking easier, and encourage car- and ridesharing. As AI helps to keep road traffic flowing, it can also reduce fuel consumption caused by vehicles idling when stationary and improve air quality and urban planning (Niestadt et al., 2019). AI can be applied to use advanced sensors effectively for targeted applications. Automation in general can bring consistency, uniformity, and remove human bias. While AI may require an initial investment, it has the potential to ultimately reduce cost. Lastly, the application of AI may significantly reduce processing time for tasks and jobs that are repetitive and require significant manual labor. In summary, the implementation of AI by DOTs can not only improve performance and speed of operation, but it can also reduce the cost of operation and minimize bias.
Software and other tools are major components of AI development and application. One of the key factors in the AI revolution is the widespread availability of software platforms for seamless deployment and experimentation. Numerous application platforms have been developed by the AI industry, including Google, Facebook, DeepMind, and MATLAB, along with practitioners and developers from industry and academia. These platforms include Scikit-learn, OpenCV, NLTK, MATLAB image processing toolbox, PyTorch, Keras, TensorFlow, and MXNet. These tools, which are developed in diverse language platforms like Python, R, Java, and MATLAB, are widely used to implement traditional ML techniques, AI-based applications, and advanced DNN models. These models can efficiently communicate with GPU clusters and high-speed storage facilities. In recent years, cloud-based services have emerged to provide these application tools along with storage and computational resources. Such cloud-based services are now provided by Amazon (Amazon AWS), Google (Google Colab), Microsoft (Azure), and NVIDIA, among others.
In this report, we have provided a comprehensive review of different AI tools. We have first provided a comprehensive understanding of the overall data processing pipeline needed for any AI-based development and deployment. Then we discussed each key components of the process, including software tools, data management, and data processing. For the software-based tools, we have discussed all possible tool sets that may be required for advanced data analytics, including statistical analysis, traditional ML, and advanced ML such as DNN. We believe that these sets of tools and their comparative study will provide a comprehensive understanding to DOTs on the promises and challenges they may expect while implementing any AI-based systems. These tools will also help DOTs to understand the role of AI tools in any upcoming projects and the components needed for them. We believe that this document will also help them to understand the infrastructure and personnel needed to execute such data-driven AI tasks.
Antdata. 2021. Power BI Reporting Implementation–n - Power BI Reports with Antdata Developers and Consultants. https://antdata.eu/power_bi_implementation.html
Chen, T., M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems. https://github.com/dmlc/web-data/raw/master/mxnet/paper/mxnet-learningsys.pdf
CloudFactory. n.d. Data Annotation Tools for Machine Learning (Evolving Guide): Choosing the Best Data Annotation Tool for Your Project. https://www.cloudfactory.com/data-annotation-toolguide#:~:text=A%20data%20annotation%20tool%20is,training%20data%20for%20machine%20learning
Hinton, G. E., N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. 2012. “Improving Neural Networks by Preventing Co-adaptation of Feature Detectors.” arXiv preprint arXiv:1207.0580.
Jia, Y., E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. “Caffe: Convolutional Architecture for Fast Feature Embedding.” Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675-678.
McCarthy, J. (2004). What is Artificial Intelligence? http://www-formal.stanford.edu/jmc/whatisai.pdf
Niestadt, M., A. Debyser, D. Scordamaglia, and M. Pape. 2019. Artificial Intelligence in Transport: Current and Future Developments, Opportunities and Challenges [Briefing]. European Parliamentary Research Service. https://www.europarl.europa.eu/RegData/etudes/BRIE/2019/635609/EPRS_BRI(2019)635609_EN.pdf
SAE International. 2021. “SAE Levels of Driving Automation™ Refined for Clarity and International Audience.” https://www.sae.org/blog/sae-j3016-update
Volet, Y. 2018. CRISP-DM-01. https://emba.epfl.ch/2018/04/10/steps-successful-machine-learningproject/crisp-dm-01/
CNN is a powerful class of neural networks designed for processing image data. Models based on CNN architectures have dominated the field of computer vision. All academic competitions and commercial applications related to image recognition, object detection, or semantic segmentation today are based on this approach. CNNs require fewer parameters than fully connected architectures, and convolutions are also easily parallelized with GPUs. Therefore, CNNs can sample and calculate efficiently to obtain an accurate model. Over time, practitioners have increasingly used CNNs even on one-dimensional sequence-structured tasks such as audio, text, and time series analysis. With some clever tweaks, CNNs also work in graph-structured data and recommender systems.
Typically, CNNs can be construed by stacking some types of layers: convolutional layers, pooling layers, fully connected layers, and residual blocks.
Convolutional layer: The convolutional layers in a CNN extract features by convolving the input images with kernels. A kernel is a small matrix that is also called a convolution matrix or convolution mask. It slides over an input image and does dot production with it at each spatial location. The output of the production is called convolution features. The number of channels of the kernel must be the same as the input image, so a three-channel RGB input image will be convolved by a three-channel kernel. Convolution features will be transferred to activation function to add non-linearity.
Pooling layer: A pooling layer is always used to reduce the size of the input data following a convolutional layer. It compresses the data and reduces the number of features to speed up the calculation and prevent overfitting.
Fully connected layer: A fully connected layer is the end of a CNN. It combines the features extracted earlier and passed them to a classifier to do final prediction.
Residual Block: A residual block is a skip connection between the layers to facilitate better learning and reduce degradation in the learning process.
A recurrent neural network (RNN) is improved from CNNs that specialized in processing sequences. Unlike the traditional neural networks, outputs from the hidden layers in RNNs will be passed to the next moment and involved in training, so the network will retain historical information from the previous moments. RNNs have natural advantages in sequence modeling and have good performance in many fields, including NLP and computer vision.
An auto-encoder is a neural network trained in unsupervised learning. Its basic idea is to directly use one or more layers of a traditional neural network to map the input data and obtain extracted vectors as features. It first learns an encoded performance of the input data and generates these input data as closely as possible with the encoded performance. The main uses of auto-encoders are dimensionality reduction, denoising, and image generation.