The workshop wrapped up with an open discussion of issues raised during the presentations and ideas for moving forward.
Alexander Szalay (Johns Hopkins University) noted that one of the biggest differences between scientific computing and private sector systems, such as Google’s, is the ease with which data flows within the system. Elaborating on this point, David Konerding (Google, Inc.) gave the example of DREMEL, Google’s internal SQL-like query system. Because the company standardizes on a single serialization protocol format called protocol buffers, which DREMEL uses to convert SQL statements into query plans, DREMEL is able to join large pools of unrelated data from different teams. Furthermore, because every Google system feeds its data into Google’s current file system, Colossus, it is possible for Google’s 20,000 computer scientists to inspect their own monitoring data and that of others to identify problems. In these ways, Google’s internal system is an example of a working data commons at a global scale, Konerding said, adding that he would like to find a way to apply same model in Google’s cloud environment such that data scientists and domain researchers could benefit from it as well.
Robert Grossman (University of Chicago) offered an example in the academic research space. Using the data commons his team built for the National Cancer Institute, researchers can apply big query and analogous techniques with various clouds to perform gene by environment computations. It is a capability multiple research teams have found useful that was not difficult to build into the system, he said.
Michela Taufer (University of Tennessee, Knoxville) raised the related issue of data movement. The need for data movement comes up in many different situations, for example in the context of bringing sensor data into simulations. Taufer asserted that the community has not sufficiently defined how data movement should be dealt with from the cyberinfrastructure point of view. She noted that if data movement is not supported, then there is a greater need to invest in tools and software, which must ultimately be connected with the data.
Szalay suggested that one answer could be to create larger data transfer nodes (DTNs) that are capable of supporting data and data movements of 100 gigabytes, yet inexpensive enough for NSF to provide to a wide array
of institutions. He noted that he was working on a prototype for this sort of DTN with funding from the Schmidt Family Foundation.
Pete Beckman (Argonne National Laboratory) pointed out the importance of defining goals for improvement and articulating a metric that can be used to assess success. For example, if the goal is on-demand deployment of containers, there needs to be a way to measure that. William Gropp (University of Illinois at Urbana-Champaign) observed that if we want to reduce friction experienced by scientists using computing facilities, we must be able to quantify friction.
Grossman asserted that, while there is a need for convergence, it is also important for NSF to get experience building specialized, midscale data systems. Over-relying on companies like Google and Amazon to fill this space risks leaving the academic research community with a dearth of data science expertise relevant to developing and using midscale systems, he suggested.
Building on this point, Dan Reed (University of Utah) posited a few scenarios under which NSF might or might not invest in midscale infrastructure. One option, he said, would be for NSF to use its limited resources to focus on big data and big computation and encourage academic institutions to support midrange systems, much as they do now in provisioning campus networks that connect to national network backbones.
One implication of this, noted Robert Harrison (Stony Brook University), is that it would sacrifice the cost efficiencies that are currently created when NSF and the universities harmonize their investments such that each gets a better return. In the case of systems that are housed at universities, Thomas Furlani (University at Buffalo) argued that NSF gets a particularly high return on investment because while NSF pays for the hardware it does not, generally speaking, have to pay the ongoing costs for support personnel, education, outreach, and training needed to run and utilize the system. By contrast, William Gropp observed, NSF covers support personnel as part of its funding for Tier 2 systems, thereby providing support for science users across the country. It may become increasingly difficult to depend on individual institutions for training and support at a time when many universities face their own resource constraints.
A second option Reed offered would be for NSF to seek substantial funding from Congress to support a 10-year effort to integrate midscale systems into a national infrastructure. This would require a Major Research Equipment and Facilities project supported by at least $100 million per year for a decade, he suggested.
A third option is to establish a new model for partnering with the private sector. Rather than a procurement model, where NSF and its grantees simply purchase products and services from companies, this new model would create partnerships in which the government and private sector work together to solve problems. One possibility for incentivizing companies to participate in such a partnership, Reed suggested, may be to offer companies the chance to monetize aspects of government data.
As an example of monetization of government data, Grossman pointed to the National Oceanic and Atmospheric Administration (NOAA), which has taken a radically different approach than NSF in terms of data sharing. Unlike NSF, NOAA gives large cloud providers the rights to use NOAA data with no restrictions. The cloud providers benefit when they find ways to monetize that data, while NOAA benefits because the arrangement vastly reduces the cost of storing and handling its data.
Based on his experience in the nonprofit arena, Grossman said one incentive sometimes used to attract private sector partners is to offer companies early access to data that will eventually be made public. This approach works well for certain kinds of data where there is value in having early access, though it does not make sense in the case of simulations, an arena in which public-private partnership has been more rare.
Participants discussed changes that could be made to the way NSF grants are allocated and tracked, which could potentially improve return on investment. For example, Beckman suggested that NSF grants could be used to procure services instead of hardware, which can sometimes solve the same problem at a lower cost. Gropp noted that that approach had been implemented at some institutions, though not uniformly nationwide. Expanding on this, Reed said a current disincentive for institutions to take that approach is that they feel financially constrained and are reluctant to waive overhead charges for public cloud services. On the other hand, Furlani suggested that some might justify charging overhead because, even if a service is “free” to an institution’s researchers, it still requires support personnel to help them take advantage of the service. Gropp noted that the discussion points to the broader challenges involved in determining which costs should be direct versus overhead.
During the presentations and subsequent discussions, planning committee members who participated in the workshop noted a number of recurring themes:
* * *
Members of the planning committee and workshop participants alike were struck both by how rapidly the scientific computing landscape is changing and how much there is a shared sense of mission around convergence. The changing landscape is marked by growing demand for both simulation and data-centric computing, the emergence of new ways of delivering computing and data services, new sources of data, new opportunities to partner with the private sector, and new strategic investments by a number of countries outside the United States. At the same time, there is a shared understanding of the challenges and opportunities in finding a way to support convergence in the broad sense of bringing data and computing together. The planning committee is pleased to see the many advances made since its 2016 report and believes that further discussions around convergence would foster additional progress.
This page intentionally left blank.
This page intentionally left blank.