Previous Chapter: APPENDIX A: ILLUSTRATION OF A PLANNING AND COORDINATION PROCESS
Suggested Citation: "APPENDIX B: THE DATA SCIENCE PROCESS." National Academies of Sciences, Engineering, and Medicine. 2023. Nuclear Proliferation and Arms Control Monitoring, Detection, and Verification: A National Security Priority: Summary of the Final Report. Washington, DC: The National Academies Press. doi: 10.17226/26558.

B

THE DATA SCIENCE PROCESS

As recognition of the importance of data science has grown, the phrase “data science” has become ubiquitous and also used to refer to many different things. To allow for greater specificity regarding data science needs in the MDV enterprise, it is worth deconstructing “data science” and the corresponding roles and responsibilities into its key components.

Fundamentally, data science is the process of bringing data together to address a problem. This process, outlined in Figure B, is not limited to the application of techniques like AI and ML to curated datasets—AI and ML are simply examples of data science tools—but rather starts with the identification of a problem and the discovery of relevant data that can be brought to bear on the problem. Once relevant data is identified, it must be ingested into data management platforms to make it readily accessible to those who need it. From there, data must be processed, cleaned, and assessed for quality and utility before modeling and analyses activities can occur, including more advanced analytical techniques like AI/ML where appropriate. Advanced data analysis has little utility without a robust foundation supporting it.

Image
FIGURE B Illustration of the data science workflow.
SOURCE: Adapted from Keller et al. (2020).

It is also important to note that different types of experts—namely, subject matter experts, data engineers, and data scientists—are responsible for different components of the workflow, as

Suggested Citation: "APPENDIX B: THE DATA SCIENCE PROCESS." National Academies of Sciences, Engineering, and Medicine. 2023. Nuclear Proliferation and Arms Control Monitoring, Detection, and Verification: A National Security Priority: Summary of the Final Report. Washington, DC: The National Academies Press. doi: 10.17226/26558.

described in Box B. The initial steps in the data science workflow—problem identification and data discovery—rely on subject matter experts (SMEs) who understand what questions stakeholders need answered and can determine what data should be collected and how to collect it. Data engineers are responsible for data ingestion and curation and have shared responsibility of data wrangling and assessment with data scientists. Data scientists are also responsible for conducting the modeling and analyses and, in collaboration with subject matter experts, communicating findings to relevant stakeholders.

The workflow shown in Figure B is not linear and requires consistent communication between SMEs, data engineers, and data scientists, as well as reassessment throughout. For example, the findings of one analysis may illuminate opportunities to link data from multiple sources or intelligence modalities. These synergies are key.

References

Keller, S. A., Shipp, S. S., Schroeder, A. D., and Korkmaz, G. (2020). Doing Data Science: A Framework and Case Study. Harvard Data Science Review, 2(1). https://doi.org/10.1162/99608f92.2d83f7f5.

UVA Data Science. September 23, 2021. Data Science vs Data Engineering. https://datascience.virginia.edu/news/data-science-vs-data-engineering.

Suggested Citation: "APPENDIX B: THE DATA SCIENCE PROCESS." National Academies of Sciences, Engineering, and Medicine. 2023. Nuclear Proliferation and Arms Control Monitoring, Detection, and Verification: A National Security Priority: Summary of the Final Report. Washington, DC: The National Academies Press. doi: 10.17226/26558.
Page 33
Suggested Citation: "APPENDIX B: THE DATA SCIENCE PROCESS." National Academies of Sciences, Engineering, and Medicine. 2023. Nuclear Proliferation and Arms Control Monitoring, Detection, and Verification: A National Security Priority: Summary of the Final Report. Washington, DC: The National Academies Press. doi: 10.17226/26558.
Page 34
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.