Given the wide-ranging applications, potential impacts, and important implications for society, the committee began its reflections on the future of data science with aspects of ethical conduct as part of a broader set of skills and capacities.
Emerging data science technologies and methodologies (1) blur differences between “public” and “private” data, (2) offer more widespread access to data and related tools, (3) influence and affect society at large, and (4) create greater opportunities for deeper insights through the use and integration of multiple data sources. As a result, data ethics take on an ever more prominent role in both data science curricula and data science practice.
The Hippocratic Oath, which details the ideal conduct of physicians in terms of their treatment of patients and interactions with colleagues, has historically been affirmed by physicians to acknowledge their understanding of key ethical principles for their profession (Box 5.1). Similarly, the Canadian “Calling of an Engineer” ceremony for engineering graduates helps establish shared moral and social responsibilities (NSPE, 2009). The pervasive impact of data science suggests that a similar oath would be beneficial for data scientists, whose work has a direct impact on individuals throughout society and on the advancement of the body of scientific knowledge. Data science students learn to solve complex problems in the world and use data to make decisions, while understanding limitations of data sets and methods.
An oath of this sort may be helpful in formalizing the role of data ethics and to inspire future data scientists to practice with honor, “do[ing] no harm” to the subjects involved in or affected by their work. This oath also formalizes the professional role of the data scientist, offering guidance on appropriate conduct to those entering the field and encouraging collaboration across diverse communities.
What might a Hippocratic Oath for data science include? To explore this question, the committee developed the text in Box 5.2 as a preliminary form of a possible pledge for future data scientists. The proposed Data Science Oath highlights aspects of data ethics and the value of incorporating societal impact as part of data science education.
At the midpoint of its study, the committee finds that it is important that data science education incorporate real data, broad impact applications, commonly deployed methods, and ethical considerations, as well as provide support for work in teams. Other critical content areas include data description and curation, mathematical foundations, computational thinking, statistical thinking, data modeling, computing, reproducibility, and data ethics. Students would also benefit from developing deep analytic and communication skills so as to better work with large, complex data sets and engage with diverse audiences about real-world problems that data science can help solve. All of these promote the
BOX 5.1
|
BOX 5.2
|
development of data acumen. Highly trained and flexible faculty, innovative cross-disciplinary pedagogical approaches, and diverse participation would enhance learning experiences. Such programs’ successes can then be evaluated and assessed using the very tools of experimental design and analysis common in the field of data science.
The findings from the preceding chapters are restated below along with key questions on which the committee would like to gather public input.
Finding 2.1: A critical component of data science education is to guide students to develop data acumen. This requires exposure to key concepts in data science, real-world data and problems that can reinforce the limitations of tools, and ethical considerations that permeate many applications. Key concepts related to developing data acumen include the following:
The necessary levels of exposure to each area will vary based on the overall objectives and duration of the data science program as well as the goals for the students.
Questions
Finding 2.2: It is important for data science education to incorporate real data, broad impact applications, and commonly deployed methods.
Questions
Finding 2.3: Incorporating ethics into an undergraduate data science program provides students with valuable skills that can be applied to complex, human-centered questions across disciplines.
Questions
Finding 2.4: Strong oral and written communication skills and the ability to work well in multidisciplinary teams are critical to students’ success in data science.
Questions
Finding 3.1: Data science curricula are enhanced by bringing together faculty from different disciplines, utilizing diverse pedagogical approaches, and building upon existing educational programs.
Questions
Finding 3.2: Structured faculty training, meaningful incentives, and available time and funding to support curriculum development are all crucial to preparing faculty for data science education.
Questions
Finding 3.3: Data science programs often adapt to the existing infrastructure and organizational structure of an academic institution, but infrastructure innovations by the institution (e.g., in data provision, data and code access, and data documentation) can help data science programs be more collaborative and multidisciplinary.
Questions
Finding 3.4: To keep up with the quickly evolving field of data science and recruit students with more diverse backgrounds, educational approaches in data science need to be flexible in terms of what concepts, skills, tools, and methods are taught; how students are recruited; and how departments and programs collaborate to provide a full data science experience to students.
Questions
Finding 4.1: Data science has the potential to draw in a diverse set of students and build in broad participation from the onset, rather than trying to broaden participation later. However, strategies are needed to recruit and retain these students.
Questions
Finding 4.2: Partnerships between 2- and 4-year institutions provide a valuable opportunity to develop innovative curricula, reach more diverse student populations, and expand the reach of data science education.
Questions
Finding 4.3: Data science programs would benefit from ongoing curricular evaluation, especially with respect to how well curricular objectives are being met and the degree of curricular
integration. Taking a cue from its own domain, these data could be used to inform data science instruction and curriculum.
Questions
The committee seeks input from the growing data science community and the public on the following topics:
Please visit the following webpage to provide input: http://www.nas.edu/EnvisioningDS.
This page intentionally left blank.