The second session of the workshop provided an international context by presenting overviews of work going on in Europe—specifically in the United Kingdom, Switzerland, and Germany. As moderator Chiara Franzoni (Polytechnic University of Milan) observed, the European situation is more fragmented than in the United States, marked by different funding systems, government agencies, and traditions. This fragmentation creates problems but also presents opportunities for experimentation because the existence of multiple institutions of relatively small size provides an agility that can enable new approaches and evaluations.
Drawing on the Cambridge English Dictionary, Albert Bravo-Biosca (Innovation Growth Lab [IGL]) defined experimentation as “a test done in order to learn something or to discover if something works or is true.” From this perspective, learning is achieved by intentionally testing hypotheses in a structured way and within set time frames. Policymakers may object that experimentation is risky, but in fact its intent is to derisk innovation, said Bravo-Biosca. He continued, “What’s risky is spending $10 billion in a program that nobody is sure is going to work. What’s actually a good use of public money is spending a few tens of millions of dollars first in a pilot to find out whether this $10 billion actually will have an impact—and, crucially, how to make this more impactful.” A policy of starting experiments small and trying different designs systematically to learn what works can greatly increase overall impact. In addition, experimentation can provide novel solutions to policy challenges and is a route to continuous improvement, thereby enabling better decisions while saving money.
Many experimental designs are possible for testing the outcomes of different approaches. Evaluation experiments can assess the impacts of a new
program or changes in the design of an existing one. Optimization experiments can test small tweaks in an implementation process, such as rapid-fire A/B testing. Randomized encouragement designs can evaluate the impact of a program without blocking access for anyone. Shadow experiments can test the variation that happens in the shadows of the current approach—for example, running parallel assessment processes but with only one used to inform actual decisions.
In general, competitive funding calls go through a process of announcement and promotion, supporting applicants and receiving applications, assessing the proposals and selecting which to fund, agreeing to terms and finalizing agreements, and the funding of programs. In such circumstances, lotteries could be used to allocate funding to marginal applicants, which opens the possibility of using this as a basis of impact evaluations. Funding calls also can be used to enhance collaboration—for example, by requiring applicants to come to an information session where they are put in groups to talk with each other. These and many other issues are discussed in a handbook developed by IGL and the Research on Research Institute (RORI), which collates examples of potential experiments.1 Bravo-Biosca also recommended the trials database on the IGL website, which provides details of more than 150 randomized controlled trials (RCTs) in the field of science innovation and growth policy.2
Becoming an experimental organization is a journey, Bravo-Biosca explained. It first involves a mindset—continuously asking what will happen if things are done differently. It also involves creating the right culture, with organizational flexibility to try new things and openness to failure. Finally, it involves instituting the necessary methods, including a clear ex ante learning strategy and the capabilities to successfully implement it.
Capabilities for experimentation include specific skills and resources that agencies need to run an experiment successfully, including access to data infrastructures, research and evaluation capabilities, and expertise with RCTs. For example, IGL has partnered with the European Network of Innovation Agencies (Taftie) to work with 17 European innovation agencies on experimentation, providing participants with the following:
___________________
1 Bendiscioli, S., T. Firpo, A. Bravo-Biosca, E. Czibor, M. Garfinkel, T. Stafford, J. Wilsdon, and H. Buckley Woods. 2022. The experimental research funder’s handbook. Revised edition. Research on Research Institute. https://doi.org/10.6084/m9.figshare.19459328.v4
IGL has learned the hard way, said Bravo-Biosca, that starting small tends to be more effective. For example, IGL has worked with the U.K. Department for Business, Energy and Industrial Strategy on some small experiments that grew and were replicated, and ultimately convinced policymakers and others to do things differently. IGL also has learned that incentives and signals can be very effective. For example, he explained, strong signals can explicitly encourage and welcome RCTs in existing policies and funding calls, with offers to cover associated costs and address misconceptions about eligibility.
Experimentation funds can identify and test new ways to support innovation across an innovation ecosystem, Bravo-Biosca explained. Such funds can promote innovative policy ideas, fund programs in exchange for rigorous evaluation, learn what works best, and create a culture of innovation and evidence. For example, the Business Basics Fund in the United Kingdom, in partnership with IGL, has invested in projects testing a wide range of different approaches to accelerate tech adoption by small and midsize enterprises.
Bravo-Biosca also stated that government agencies that have embraced experimentation have found that approaches that were not expected to work were in fact the most successful, that it is possible to undertake experiments involving small changes without requiring legislative reforms, and that even small experiments can have major impacts.
Successful execution of experiments is a team sport involving academic researchers, policymakers, and experienced implementers, Bravo-Biosca concluded. Many of the constraints in conducting experiments, such as how to communicate and engage with participants and the wider public, have been faced by others, which is a reason for bringing people and organizations together to talk, learn, and collaborate.
Matthias Egger (Swiss National Science Foundation [SNSF]) said that the foundation has made a number of changes in its evaluation procedures in recent years in an effort to fulfill its mission to fund excellent research and promising young researchers. (SNSF funded about 2,700 grants in 2022.) It has developed a new curriculum vitae (CV) format for asking applicants about their contributions to science, rather than relying on metrics such as impact factors; has adopted a unified evaluation procedure; and has created a training program for the chairs and members of evaluation panels. It also has an interest in research on research, focusing on peer review, predatory publishing, open science, and career trajectories.
“I am always amazed by how much money we spend and how little we know how best to do it,” Egger said. For example, when two independent panels evaluate the same proposals, the agreement between the panels is about 70 percent, revealing a marked element of chance. To accommodate this element of
chance, SNSF has developed a procedure in which panel members vote on proposals on a scale from one to nine, and a statistical method is then used to rank the proposals while taking into account the uncertainty around that rank.3 When multiple proposals have overlapping uncertainties but cannot all be funded, a lottery is conducted to fund some of them, thereby reducing the length of review meetings and increasing fairness.
Egger said that SNSF is also developing a system of natural language processing for peer-review reports, which it is combining with other research to examine its peer-review system.4 For example, it has looked at potential bias in peer review, such as whether reviewers suggested by the authors of a paper give more favorable reviews than other reviewers. “We stopped asking applicants to provide suggestions very quickly after we saw the results, [because] they were so much more positive than the others,” he explained. They also looked at data on gender matching and found both that women are more critical than men in their reviews and that men give other men substantially more positive reviews than they give women.5 “These subtle differences are quite useful when we train our panel members and our chairs at the foundation,” stated Egger.
Finally, Egger touched on the difficulties of conducting experiments at a funding body because, by law, everyone needs to be treated the same. Proposals just above and just below the funding line can be put into a random draw, which treats all proposals the same but allows for randomization. With this partial lottery approach, SNSF collects data on the successful and unsuccessful proposals, which it has combined with simulations and career tracker cohorts to gauge the effectiveness of review panels. These investigations have revealed, for example, that the allocation of funding for fellowships is basically the same using remote evaluation of applications.6 “The panel didn’t add very much to the process,” he said.
In response to a question about finding program managers who have the kinds of skills needed for more open-ended agencies, such as the Defense Advanced Research Projects Agency (DARPA), Egger said that the term that comes to mind for such people is “mavens.” Such people are very good at bringing projects over the line to success. They are experienced in the relevant science and technology, are entrepreneurial, and have experience in commercialization. They have the quality “of not just being excellent but also being helpful.”
___________________
3 See Heyard, R., M. Ott, G. Salanti, and M. Egger. 2022. Rethinking the funding line at the Swiss National Science Foundation: Bayesian ranking and lottery. Statistics and Public Policy 9(1):110-121. https://doi.org/10.1080/2330443X.2022.2086190
4 See Severin, A., M. Strinzel, M. Egger, T. Barros, A. Sokolov, J. Vilstrup Mouatt, and S. Müller. In preparation. Journal impact factor and peer review thoroughness and helpfulness: A supervised machine learning study. arXiv. https://doi.org/10.48550/arXiv.2207.09821
5 See Severin, A., J. Martins, R. Heyard, F. Delavy, A. Jorstad, and M. Egger. 2020. Gender and other potential biases in peer review: Cross-sectional analysis of 38,250 external peer review reports. BMJ Open 10:e035058. https://doi.org/10.1136/bmjopen-2019-035058
6 See Bieri, M., K. Roser, R. Heyard, and M. Egger. 2021. Face-to-face panel meetings versus remote evaluation of fellowship applications: Simulation study at the Swiss National Science Foundation. BMJ Open 11:e047386. https://doi.org/10.1136/bmjopen-2020-047386
Egger was also asked whether the lottery approach might generate a flurry of low-quality applications from people who are hoping to succeed in the lottery, but he said that this has not been a problem and that the number of grants that have gone into the lottery has been small. He also pointed out that the lottery could increase the diversity of those funded, which is a goal of the agency. For example, one program involves giving relatively small grants to projects where the CV of the applicant is not known, which may increase the success rate of proposals made by women.
Finally, Egger emphasized, in responding to another question, that the agency is still experimenting with natural language processing. For example, the technique may provide a way to allocate tasks to peer reviewers or give panels an assessment of the quality of their peer-review reports. That observation led Bravo-Biosca to observe that the use of such algorithms also raises the question of the impact such use has on assessors: “It’s an area you definitely have to test.”
Dietmar Harhoff (Max Planck Institute for Innovation and Competition) reported on a new federal agency in Germany that is seeking to experiment with alternate ways of funding science and innovation. In response to discussions in Germany about declining competitiveness in major export industries, a lack of agility in science and innovation policies, and the difficulties of transferring ideas from research laboratories into commercial products, the German government established a completely new agency designed to support radical innovation. “A lot of serendipity was involved was involved in the process,” Harhoff reported.
Harhoff explained that the notion of improving the transfer of scientific ideas to commercial products was mentioned by Chancellor Angela Merkel in 2016 in a speech on innovation. An internal paper called “The DARPA Effect” led to a concept paper for the establishment of a new agency. The German Federal Ministry of Education and Research and the German Federal Ministry for Economic Affairs bought into the concept shortly before a new government came into power at the end of 2017. A Coalition Agreement reached in 2018 included an announcement to set up the new agency, and planning began with a search commission set up to determine the first director and the location of the new agency in 2019. The commission, he said, made the unusual choice for director of Rafael Laguna de la Vera, who had been not an academic or researcher but a highly successful entrepreneur with a focus on open source software, and he made the unusual choice of locating the agency not in Berlin but in Leipzig.
The new agency was named the Federal Agency for Disruptive Innovation (SPRIN-D) and commenced work in 2020, Harhoff continued. By the time of the workshop, it had been involved in four activities: validation projects, setting up project companies, running technology challenges, and developing broader strategies for the governance of the German innovation system, such as chip manufacturing capabilities, intellectual property transfer standards, and the resilience of the open source ecosystem. Some of these activities contributed to
the incubation of the technology venture fund Sovereign Tech Fund to make equity financing of technology projects possible.
Funding for the validation projects ranges up to only 200,000 euros, “which is not a big project,” Harhoff noted. However, the support can be made available within a few weeks, “which is much faster than any funding process previously known in the German system.” The project companies are dedicated companies that engage in larger research and transfer projects, which can last multiple years, and over their lifetime can consume 20–90 million euros—“so these are big projects.” SPRIN-D has run challenge programs in the areas of broad-scale antivirals, carbon to value, new computing concepts, long-duration energy storage, and the biocircular economy, with funding for a given challenge ranging as high as 40 million euros. Harhoff reported that a proposal for a “SPRIN-D Freedom Law” being discussed at the interministerial level (as of the time of the workshop) would further expand SPRIN-D’s activities into research grants and equity funding, with a substantially larger budget.
Harhoff went into more detail on the technology challenges SPRIN-D has sponsored. Most challenges run for 3 years and are divided into three stages. Up to a dozen teams are typically admitted to Stage 1, up to six teams to Stage 2, and up to four teams to Stage 3. Teams can participate if they are located in the European Union, the European Free Trade Association, the United Kingdom, or Israel, with individual team members or partners located outside of this region. Judges of the competition are drawn from multiple countries and economic sectors to achieve broad perspectives. The intellectual property rights created by the teams during the challenge remain with the teams, although SPRIN-D receives a free and nonexclusive right to use the results. The teams may undertake to grant licenses to third parties at standard market conditions.
The new agency has been outspoken in its communications and discussion, Harhoff observed. It has weighed in on controversial issues, such as the high royalties charged by universities for releasing intellectual property to start-up companies. It has used social media intensively to communicate with stakeholders, including cartoons and explainer videos, to reach unorthodox innovators. It has become involved in debates on technology sovereignty and strategic manufacturing, and it has engaged in strong international outreach efforts. It has achieved strong acceptance in the science and engineering communities, said Harhoff.
Harhoff drew several insights from the new agency’s experiences. He warned against ex post rationalization when looking at policy processes and institutions. The impetus for the new agency came mostly out of the country-specific context in Germany, with DARPA being an idealized role model. An acceptance of risk requires a portfolio logic, Harhoff stated, in which the full portfolio is successful even though not every project succeeds. As with DARPA, the success of the model requires selecting people and then entrusting them with substantial leeway. Budgets need to roll over within the agency to ensure continuity. The need for independence means that the German parliament cannot inspect every small project or influence specific investment decisions. And new
people and new ideas need to be brought in despite a labor market that favors long-term rather than temporary employment.
Copycat implementation is a myth, Harhoff said. There are too many boundary conditions and path dependencies. “Sometimes you have to look for the right tools that you haven’t seen before,” he continued. As just one example, the challenge programs are being implemented through a European Union mechanism called precompetitive procurement, which provides considerable leeway for procurement projects in the context of E.U. state aid regulation. SPRIN-D was the first German organization to use precompetitive procurement systematically, but other agencies are following its lead. Finally, even prior to the recent decoupling and technological sovereignty discussion, international collaboration is becoming more important, as in the networking taking place between SPRIN-D and similar agencies elsewhere.
Asked about the greatest difficulties they encountered in convincing people to change, Egger noted that even the most innovative scientists can be conservative in changing institutions and procedures. For instance, the introduction of the Bayesian ranking system created issues initially, but once scientists experienced its effects they came to prefer it. As a counterexample to that success, the panels that judge fellowship applications are still in place even though they add little value to decisions, Egger said, partly because of the advantages of bringing people together to talk and deliberate.
Bravo-Biosca observed that innovation and entrepreneurship policy has a poorer evidence base than do education, development, and other social policies. There is often risk aversion among people who set policy, since a bad outcome in a particular program within the broader policy area can undermine the case for having a policy in the first place. The overall ecosystem is also very resistant to change, he said: “Changing that inertia requires a lot of effort, a lot of goodwill, a lot of coalition building.” Finally, civil servants are very busy, and experimentation takes time and effort that they may not have.
Harhoff, too, pointed to the difficulties of changing stabilized systems. “This is not science,” he said, “this is simply human nature.” Fostering change in such a situation requires leadership and a compelling case for change.
The presenters were asked about measuring outcomes in terms of assessing benefits that are more difficult to measure than something like citations, such as the social value of an innovation or the training of students involved in doing research. Egger said that one of the easier things to measure is the result of career funding, particularly in similarly matched students who either do or do not receive funding. Do they stay in academia? Do they stay in science in some other context? Do they go on to fund companies? What are the gender differences? He also pointed to the patterns of activities that lead to patents as a way of measuring outcomes.
Bravo-Biosca pointed to the importance of developing indicators that can be measured relatively quickly, “so that you don’t have to wait 10 years to give an answer to policymakers.” Another important advance, he said, would be standardized and validated measurement instruments that could be used for different types of experiments, which would also make data sets more comparable.
Harhoff added the challenge of “how you would define a glorious failure, a failure that at the same time leads to a lot of learning.” Such failures can lead to the definition of new challenges, which can be an unanticipated consequence of research.
Finally, a participant suggested that collaboration on experiments—such as experiments that randomize awards in the middle group of proposals—would speed the collection of data. In response, Egger pointed to some of the difficulties funders have in working together, not only for cultural reasons but also for legal reasons. Nevertheless, RORI is pursuing the idea of compiling a data set on people funded at random, “and we would be very happy to collaborate on such a project.”
Bravo-Biosca remarked on the opportunities for governments and researchers in Organization for Economic Cooperation and Development countries to collaborate more intensively. Harhoff reminded the workshop of the challenges faced in creating broader participation in the innovation space, where “you have some very unorthodox players that you have to scout actively and get into the process.”