This chapter provides guidance on common data governance initiatives:
One DOT contracted for collection of video and LiDAR data for the state highway system, with extracted information on asset types and locations.
Following contract execution and initiation of data collection, additional requirements emerged to obtain bridge underclearances and culvert locations. However, obtaining information for these assets would require changes to equipment configurations which was not possible to do. Earlier and broader collaboration on requirements gathering would have allowed for these additional items to be included.
Establishing guidelines and processes to be followed prior to collecting data in the field or purchasing or licensing data from external vendors.
The intent of data collection/acquisition oversight is to (1) maximize data collection/acquisition efficiency and orderly adoption of new technologies, (2) prevent acquisition of data that duplicates what already exists in the agency, (3) encourage use of established data standards that enable the new data to be linked as needed to existing data (e.g., through standard location referencing), (4) identify opportunities for multiple business units to collaborate on data acquisition to meet common needs, and
(5) help business units anticipate and plan for activities needed to manage new data throughout its life cycle.
There are several options, with varying degrees of formality:
The Planning group within one DOT licensed a commercial data source of connected vehicle information and used this to calculate Vehicle Miles of Travel (VMT). They felt that this provided a more accurate estimate of VMT than what the agency’s traffic monitoring group produced based on traditional traffic counts.
However, the result was that two different groups within the DOT were providing two different estimates of VMT to MPOs and other partners.
A data governance process requiring the Planning group to share their data purchase plans with others in the agency would have provided an opportunity for discussion and agreement on how to integrate the new VMT data wtihin the agency – with ground rules for sharing the different types of data and assurance that differences between the two sources would be clearly documented in metadata.
See the Communications Guide, Chapter 5 for an example of how to identify target audiences for communication about a planned data acquisition review process
See Chapter 7 for sample data proposal initiation intake questions.
There are several interrelated parts to data documentation:
Table Name
Field Name
Field Alias (for report titles)
Field Description
Field Type
Field Length
Field Precision
Required?
Unique?
Field Domain Type
Field Units of Measure Confidential?
Sensitive?
Default Value
Acceptable Values (Min, Max, List)
Primary Key?
Source System
Usage Notes
There are different varieties of metadata to consider, including:
Data documentation is foundational for data governance because it provides an understanding of the scale and nature of the data to be governed. Data documentation can be used to:
Data documentation also benefits data engineers, analysts, and other data users by enabling discovery and understanding of the agency’s data resources.
See Chapter 7 for an example data flow diagram
At a minimum, you will want to create a basic data catalog and adopt metadata standards for your stewards to follow.
Steps in building a basic data catalog are:
Steps in standardizing metadata are:
Once you have the basics in place, you will want to explore bringing on a technology solution for data cataloguing and metadata management. There are several available products, some of which are associated with specific cloud solutions and BI platforms. Because technology implementation can be a lengthy process, it is best to get started with an interim solution while you are pursuing a longer-term implementation project. Be sure to carefully define your business requirements and evaluate alternative solutions prior to moving forward with a particular product.
Business Glossary Entry:
Associated Data Dictionary Entry:
A business data glossary is a managed and governed compendium of business terms and definitions for important concepts related to governed agency data.
Business data glossaries establish authoritative, shared definitions for an organization’s specialized terminology. In the context of data governance, business data glossaries help an organization’s employees to understand the precise meaning of data entities and elements associated with the concepts in the glossary. Business data glossary definitions can provide a foundation for the data element descriptions within data dictionaries by
eliminating the need to repetitively (and possibly inconsistently) include concept definitions for each data element related to a concept.
Write term names in the singular form
State descriptions as a phrase or with a few sentences
Express descriptions without embedded concepts and without definitions of other terms
State what the concept of a term is rather than just what it is not
Use only commonly understood abbreviations in descriptions
State the essential meaning of the concept
Be precise, unambiguous, and concise
Be able to stand alone
Be expressed without embedded rationale, functional usage, or procedural information
Avoid definitions that use other term definitions
Use the same terminology and consistent logical structure for related definitions
Steps for building a business data glossary are:
Glossary Manager:
Term Owners:
Governance Group:
Data quality management consists of a coordinated set of activities to:

Data quality management helps to make sure that investments in data pay off – i.e., that data serve their intended purposes and provide business value. Data quality management helps agencies to avoid the negative consequences of poor data quality, which include:
Timeliness – amount of time between when an event occurs and when information about the event is recorde
Accuracy – match to ground truth
Completeness – presence of expected records and attributes
Uniformity, Validity or Conformity – consistency across files or database records; adherence to business rules and metadata or to an established standard
Repeatability or Precision – match between repeated measurements
Uniqueness – absence of duplicate records
Accessibility – ability of authorized users to obtain the data
Agencies can pursue one or more of the following approaches to improving data quality management practices within their organizations:
Caltrans adopted a standard data quality management plan template and has trained data stewards in use of a spatial data quality tool.
USDOT’s 2019 Information Dissemination Guidelines present practices followed by operating administrations to ensure data utility, objectivity, integrity, accessibility, public access, and re-use.
NHTSA has defined a “six-pack” of data quality measures to be applied to six core traffic data systems (Crash, Vehicle, Driver, Roadway, Citation/Adjudication, and EMS/Injury Surveillance): timeliness, accuracy, completeness, uniformity, integration, and accessibility.
DOT Organizational Unit Name
Route ID
Milepoint
Latitude and Longitude
Project ID
Asset ID
Asset Type
Vehicle Identifation Number (VIN)
State Fiscal Year
Employee ID
MPO
Agency ID
Funding Source
City Name
County Name
Event DateTime
Data element standards define the meaning, naming conventions, type, format, value domains and (where applicable), units of measure and levels of resolution for data elements that appear in multiple datasets.
Data element standards provide the basis for achieving consistency across different datasets, which enables data to be integrated or combined for analysis and reporting.
Steps for establishing data element standards are:
See Chapter 7 for a sample Data Element Standard.
Iowa DOT created a set of foundational data element standards focusing on dates, times, locations, measures, currency and identifiers for projects, vehicles, equipment, employees, and positions.
Caltrans established a data element standard template and created initial standards for data related to locations and asset management.
Master data management establishes authoritative data sources that provide a consistent and uniform set of identifiers and extended attributes describing the core
entities of an organization. Once created, master data sets are catalogued, documented, and made available for use.
Reference Data Entities:
Master Data Entities:
Reference data management entails maintenance of stable, re-usable datasets, such as code tables, for use by multiple systems across the agency. It focuses on harmonizing and sharing data that is used to classify or provide context for other data. Many reference data elements are external – for example, zip codes or state 2-letter abbreviations. For a DOT, reference data would include things like highway functional classifications, asset types, vehicle makes and models, and incident statuses.
There are different architectural options for master and reference data management – including a Registry approach that uses an index to point to master and reference data in source systems, a Hub approach that consolidates master and reference data in a central repository, and a Hybrid approach which leaves the master and reference data in their original source systems but synchronizes the data with a Hub repository for general access.
Master and reference data management is an essential practice that helps agencies to reduce complexity and improve connectivity and consistency of their information systems. Master and reference data management provide the tools and processes needed to establish a single source of truth to guide both internal DOT decision-making and provide information to meet external requirements and inquiries.
The first step is to identify and prioritize specific master and reference data entities and elements to be managed. While specific activities will vary depending on the scope and nature of what is selected, the following activities will generally be needed to establish master and reference data sources:
Virginia DOT created a Master Data Management roadmap that identifies data to be mastered over a three-year period. They have allocated resources for data modelling, design, and delivery of the master data.
Caltrans has adopted a standard approach to master and reference data management and has piloted this approach for creating a reference data set with information about California agencies that receive transportation funding.
Data sharing is the process of making data available for use beyond the system or location where it is managed. This maximizes data re-use and avoids duplicative effort. Data can be shared in multiple ways (see sidebar). Data sharing can take place within an agency, across agencies, between an agency and a citizen who requests data, between an agency and a set of registered users, or between an agency and the public at large.
From a data governance perspective, data sharing is an activity that benefits from oversight in the form of policies, guidelines, and agreements. Policies and guidelines for data sharing cover what data can (and should) be shared, with whom, and how. Data sharing agreements are used to define the terms under which data sets are shared (internally or externally).
Via a specialized application (e.g. traffic monitoring software)
Open Data Portal
Agency GIS Portal
Data Warehouse/Mart
Data Lake
File server
FTP site
Cloud data sharing site
Application Programming Interface (API)
Web Service
Data sharing policies and guidelines have several benefits:
Data sharing agreements can be used to establish expectations for both the data provider and recipient, including:
See Chapter 7 for a sample Data Sharing Agreement outline.
These agreements can be inter-agency or intra-agency. In some cases, they may alleviate concerns about sharing data by stipulating appropriate use.
DOTs can pursue several avenues for managing how data are shared – both to encourage sharing of available data and to make sure that data are shared in a responsible manner:
Washington State DOT (WSDOT) executed a data sharing agreement with the City of Seattle to exchange construction location and schedule information for projects within the city limits.
Florida DOT developed a Data Management Planning (DMP) resource document that lists key considerations for each phase of the data lifecycle, including data sharing.
Crash records – person and vehicle details
Driver license and vehicle registration data
License Plate reader data
Vehicle transponder data
Connected vehicle data
Employee cell phone GPS data
Employee personnel records
Professional Engineer (PE) license numbers (can be used to look up home addresses)
Disadvantaged Busienss Enterprise (DBE) certification applications
Locations of archeological sites
Engineer’s Estimates for Construction Projects (prior to bid opening)
Facility security camera locations
Vendor-supplied proprietary data
Computer network security information
IP addresses of Operational Technology (e.g., ITS Devices or Traffic Signals)
Detailed inspection or design data associated with Critical Infrastructure
Private and sensitive data include personal data (data about specific people that compromises their privacy) as well as other data that, if disclosed, could jeopardize the privacy or security of agency employees, clients, or partners, or cause major damage to the agency.
Most states have information security policies in place that define data or information classification levels and establish guidelines for classifying data. Data classification levels are used to indicate the level of risk associated with disclosure – and to tailor access restrictions based on these risks. State statutes may specify responsibilities for classifying data and for protecting data through access controls. For example, data classification may be a business-informed legal determination and cybersecurity may be a state or agency-level IT responsibility.
Keep in mind that policies and terminology vary from state to state – for example, some states have a legal definition for “sensitive” data whereas others do not.
Protecting private and sensitive data involves identifying and classifying the data and ensuring that appropriate practices are used to protect and control access to the data. Managing private data also includes activities to ensure that: (1) the data are needed for a legitimate purpose and are deleted after that purpose has been achieved (2) the agency is being transparent about what data are collected, how the data are used, and
how they are protected. Agency practices for data assessment should consider not only data that are produced or collected by the agency but also third-party data sets.
Putting strong protections in place ensures compliance with applicable laws and standards – which vary by state. It helps agencies to avoid or minimize the negative consequences of disclosing private and sensitive data which can include:
Collection Limitation: Data collection should be lawful and gathered with consent.
Data Quality: Personal data should be relevant and accurate.
Purpose Specification: Specify the purposes for which you use personal data.
Use Limitation: Do not disclose personal data.
Security Safeguards: Always implement security safeguards.
Openness: Businesses and entities should keep their practices as open as possible.
Individual Participation: Individuals should have the right to find out what personal data has been used and to regain control of it.
Accountability: The person in control of the data is responsible.
Protecting private and sensitive data requires a partnership between the data governance team and the IT team. The following actions are within the scope of data governance:
“A privacy impact assessment is an analysis of how personally identifiable information is handled to ensure that handling conforms to applicable privacy requirements, determine the privacy risks associated with an information system or activity, and evaluate ways to mitigate privacy risks. A privacy impact assessment is both an analysis and a formal document that details the process and the outcome of the analysis.”
Data governance monitoring and reporting involves tracking your accomplishments and indicators of progress towards achieving your anticipated goals. It also includes gathering and analyzing additional information that helps you to target your data governance activities for maximum impact.
Monitoring and reporting provide accountability for the data governance function. They enable the data governance team to communicate to the agency leadership, sponsors, and stakeholders what has been accomplished and how it has helped to achieve the intended outcomes. This information is valuable to have during annual budgeting cycles, when every program and function may be subject to scrutiny. It is especially helpful in
situations where there has been a change in agency leadership and the new leaders need to be briefed about data governance.
Monitoring also provides important feedback to the data governance team that helps them to fine tune tactics and improve based on experience. For example, an employee survey can yield information about the level of awareness of data security classifications that can be used to determine whether further information dissemination on this topic is needed.
In initial stages of data governance implementation, monitoring and reporting can be kept simple and focus on tracking activities and participation levels. Once data governance has been established and there are specific initiatives underway, a handful of key indicators reflecting outcomes can be introduced. Specific steps that can be considered are as follows:
Caltrans conducts a data management and governance maturity assessment every two years and tracks progress towards target maturity levels. The data governance team maintains a data governance work plan and tracks the status of each item. They brief members of the data governance bodies regularly on the status of the work plan items.
Table 5. Data Governance Monitoring and Reporting: Sample Indicators
| Category | Indicator |
|---|---|
| Data Governance Maturity | Change in data governance maturity level (measured using periodic assessments) |
| Obligations Met | On-time provision of required data or reports to external parties |
| Accomplishments - General | Achievement or delivery of planned tasks, deliverables, and milestones |
| Accomplishments – People | Number of stewards identified and onboarded |
| Number of outreach meetings or training sessions held | |
| Number or percent of senior managers briefed | |
| Number or percent of target employees trained | |
| Accomplishments: Governed Data | Number of data catalog entries added/validated |
| Number of business glossary entries added/validated | |
| Number of enterprise/agency/corporate datasets identified and under governance (which could be defined as catalogued, complete metadata, business rules, and data quality management) | |
| Number or percent of priority datasets with standard metadata complete | |
| Number of data quality management plans produced/updated | |
| Number of business rules documented or coded within a data quality tool | |
| Accomplishments: Standardized Data | Percent of target data entities mastered |
| Number of data element standards adopted | |
| Accomplishments: Risk Reduction | Reduction in number of unmanaged desktop datasets classified as having enterprise or agency-wide data |
| Number of data security or privacy audits completed (or percent of target systems/datasets with a completed audit within the last 3 years) | |
| Activity/Use of Resources | Number of hits on data governance resources (website, catalog, glossary, metadata repository) - trendline |
| Use of available repositories or portals for data sharing - # reports, # hits, # queries, # registered users |
| Category | Indicator |
|---|---|
| Employee Perceptions and Awareness | Change in employee ratings of: Ease of finding, accessing, and using data; ease of sharing data; time to produce useful reports; data quality and consistency (measured through baseline and follow up employee surveys) |
| Employee awareness of: data governance bodies, data stewardship roles and responsibilities, policies, standards, private and sensitive data elements, data sharing practices (measured through employee surveys) | |
| Satisfaction rating for the data governance team’s service delivery (measured through survey following service delivery) |
This chapter has provided basic guidance on several common data governance initiatives that DOTs can pursue. See the resources and references in Chapters 7 and 8 for additional information.