Page 72 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

8

Efficiency Through Technology Advancement: Hardware–Software Interactions

In a keynote presentation and panel discussion, participants examined opportunities to improve carbon efficiency and sustainability through the coordinated design of hardware and software. William Dally, NVIDIA, set the stage for the discussion with a keynote presentation describing the major contributors to the large energy efficiency gains seen in both hardware and software for large language models (LLMs) over the past decade and what the future may hold for further developments in hardware and model efficiency. Panelists then delivered opening remarks outlining the landscape of hardware platforms and the role of specialized architectures for artificial intelligence (AI), and Dally and other panelists explored the software strategies required to achieve high performance and efficiency on these architectures in an open discussion.

KEYNOTE PRESENTATION

Dally highlighted how AI developments have accelerated energy demands and described some of the factors that may influence this trajectory in the future.¹ While huge efficiency gains have been achieved—today’s AI chips are upward of 1,000 times more efficient than their predecessors—some experts subscribe to Jevons paradox, that more efficiency will just create more demand. To meet this demand, companies will

___________________

¹ K. Bourzac, 2024, “Fixing AI’s Energy Crisis,” Nature, October 17, https://doi.org/10.1038/d41586-024-03408-z.

Page 73 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

seek to build AI data centers wherever power generation can be quickly ramped up. Because this is generally a slower process in the United States than some other countries, Dally posited that challenges with energy capacity not only represent an important issue for the U.S. grid but pose a threat to the United States’ dominance in AI.

Dally offered context on why AI data centers have such large energy demands. A primary driver is the energy required for model training. The amount of floating-point operations per second required to train models has gone up by a factor of 10⁷ and is still increasing by more than an order of magnitude each year. In addition, inference is representing a growing share of AI activities, albeit more slowly. Every AI model is also conducting more operations per second, and more models are being deployed across more applications, such as medicine, science, and entertainment, creating more potential value and further driving demand.² Finally, AI models are becoming more complicated, with more steps requiring more energy.

AI demand is not going away, but Dally posited that its energy needs can be minimized through continual hardware and software innovations and optimizations. He highlighted how the significant gains made in AI chips’ energy efficiency and performance over the past decade (Figure 8-1) did not come from smaller transistors and process technology improvements, but from improving number representation. Using fewer bits to represent numbers (going from 32 to just 4) has made instructions more efficient. As a result, extra processing overhead has dropped to less than 10 percent. In addition, structured sparsity schemes make algorithms much more efficient while preserving accuracy, although more research is needed to build effective sparse matrix graphics processing unit (GPU) hardware.^3,4 NVIDIA’s current GPU has 20 PetaFLOPS available for inference operations, providing state-of-the-art efficiency and performance. Higher-density racks also improve efficiency, because communication at close range takes less energy. Next-generation technology will provide 100 TeraOPS/W for even more efficiency.⁵ Software efficiency has also

___________________

² E. Masanet, A. Shehabi, N. Lei, S. Smith, and J. Koomey, 2020, “Recalibrating Global Data Center Energy-Use Estimates,” Science 367(6481):984–986, https://doi.org/10.1126/science.aba3758.

³ NVIDIA, n.d., NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Acceleration at Every Scale, https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf, accessed December 15, 2024.

⁴ A. Mishra, J.A. Latorre, J. Pool, et al., 2021, “Accelerating Sparse Deep Neural Networks,” arXiv:2104.08378, https://doi.org/10.48550/arXiv.2104.08378.

⁵ B. Keller, R. Venkatesan, S. Dai, S.G. Tell, B. Zimmer, and W.J. Dally, 2022, “A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-Bit Quantization for Transformers in 5nm,” Pp. 16–17 in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), https://doi.org/10.1109/VLSITechnologyandCir46769.2022.9830277.

Page 74 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

**FIGURE 8-1** Trends in artificial intelligence chip performance, 2012–2023.
SOURCE: William Dally, NVIDIA, presentation to the workshop, November 13, 2024.

improved. Structured sparsity also improves neural networks through pruning and retraining.⁶ As a result of all of these developments, between 2012 and 2024, it is estimated that single-chip inference performance increased by 5,000-fold, AI hardware efficiency increased by 1,250-fold, and AI software efficiency increased by more than 1,000-fold, Dally said.

Looking forward, Dally outlined several drivers that can continue this trajectory toward even greater efficiency. First, he suggested focusing on efficiency alongside accuracy as a standard evaluation metric for AI innovations. Second, researchers are exploring how efficiency might be improved by implementing state–space models in place of transformer architectures, which are expensive in comparison, or hybrids of

___________________

⁶ R. Venkatesan, Y.S. Shao, M. Wang, J. Clemons, S. Dai, and M. Fojtik, 2019, “MAGNet: A Modular Accelerator Generator for Neural Networks,” Pp. 1–8 in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), https://doi.org/10.1109/ICCAD45719.2019.8942127.

Page 75 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

the two approaches. Finally, he suggested efficiency gains could be made by implementing smaller, more specialized models through distillation, retraining, and fine-tuning.⁷

Discussion

In a question-and-answer session, participants delved deeper into AI efficiency and then discussed several considerations at the intersection of data centers and the electric power grid. Andrew Chien, University of Chicago, asked how NVIDIA was making GPUs efficient during the AI training phase, especially as size and capacity increases. Dally replied that it is harder to measure training efficiency than inference efficiency, and therefore harder to reduce the precision on training, but NVIDIA’s chips have no loss of accuracy for inference. Moving from single GPUs to clusters also increases efficiency if the units are located close together, making high-bandwidth communication cheaper and more efficient. Communication efficiency innovations also mean that most of the energy demand is from computation. As racks get larger, they do use more power, but it is still relatively inexpensive and efficient at scale and compared to alternatives.

A participant asked Dally what other hardware innovations can improve energy efficiency. He replied that there is potential in increasing sparsity on weights and activations, and also in number representation. He added that very aggressive packaging could also reduce the communication and data movement energy needed. Improvements are unlikely to come from smaller transistors or more advanced process nodes, however. Another participant asked if there are trade-offs between increased efficiency and security. Dally replied that there are not, stating that NVIDIA’s chips are fully secure, with efficiencies safely located within GPUs’ secure enclaves.

Carole-Jean Wu, Meta, asked about energy efficiencies related to cooling requirements. Dally said he did not have the data on this and does not know if the energy required to chill water is accounted for, but believes the energy required to actually pump chilled water through the racks is not a substantial portion of the total energy used to run the AI systems.

Participants then turned to the interactions between AI data centers and the grid. Tamar Eilam, IBM Research, asked how AI workloads can support flexibility within the power grid. Dally replied that the level of fine-grain control over GPUs should mean that there is no technical challenge with this, but there may be an economic one, as companies

___________________

⁷ S.T. Sreenivas, S. Muralidharan, R. Joshi, et al., 2024, “LLM Pruning and Distillation in Practice: The Minitron Approach,” arXiv:2408.11796, https://doi.org/10.48550/arXiv.2408.11796.

Page 76 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

that buy expensive equipment may be reluctant to reduce performance when, for example, there is not enough renewable energy available to run them.

Another participant asked where Dally would expect new, large-scale data centers to be built, if not in the United States. He replied that it may be possible that parts of Texas and the western United States could be areas for growth in this space, but in general he speculated that countries with sufficient wealth, fewer regulatory hurdles, or both—such as in the Middle East or South America—are the most likely to attract data centers.

Asked about AI data centers’ storage and networking requirements, Dally noted that networking needs increase quite a bit, especially with large training clusters. They also go up during inference as external clients connect. Storage is less clear, because while data centers have to store all of the weights that the different models serve, it may plateau as the number of models and weights stabilizes. Another participant asked if interconnection speeds will also increase. Dally replied that they had been rapidly increasing but are at a plateau now until fundamental blockers can be removed. One potential solution is optical GPU interconnection, which offers increased efficiency and performance and is already implemented in long-distance interconnects.

Wu asked if fault tolerance or failures pose concerns with large-scale computing arrays. Dally replied that they are concerning because they can require disruptive restarts. Switching from checkpoint restarts to continual computations helps, as does replacing the faulting or failing piece. That switch will disrupt the bulk synchronous nature of AI training, however, and so NVIDIA is searching for a solution that balances the load more equitably without slowing computation.

PANELIST REMARKS

Wu moderated a panel discussion focused on opportunities to make efficiency gains through technological advancements in hardware–software interactions to balance AI’s exponential growth. The panelists were Miloš Popovic, Ayar Labs; Eilam; Valerie Taylor, Argonne National Laboratory; Vivienne Sze, Massachusetts Institute of Technology; and Dally.

Improving Artificial Intelligence Infrastructure Efficiency with Optical Input-Output

AI processors have made great gains in density, but these architectures are currently limited by data bandwidth and data movement. Popovic described how optical interconnection (also known as optical

Page 77 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

input-output or optical I/O⁸) will soon be able to solve both of these problems and substantially improve the efficiency of AI hardware.

Light traveling through optical fiber minimizes bandwidth needs because it takes less space than an electrical signal through a wire, and many wavelengths fit on the same fiber. As a result, more data—orders of magnitude more—can be passed through a single strand.⁹ In addition, optical interconnection provides distance-independent energy consumption, as opposed to wires, whose energy use is proportional to distance. Finally, optics need very little energy to move data when they are close to and deeply integrated with computing elements.

Current processors that use optical interconnection have both efficiencies and limitations.¹⁰ To make progress, Ayar Labs scientists have experimented with optically interconnected GPUs and divided workloads and found a 20-fold improvement in energy efficiency, especially when implementing parallel programming across multiple processors. This demonstrates how optical I/O could enable future growth in the number of interconnected GPUs with minimal limitations.

Innovating for Efficiency Across the Artificial Intelligence Stack

Despite the many efficiencies implemented thus far, Eilam said that additional innovations across every layer of the AI stack are needed to minimize energy demands. For example, she said that fit-for-purpose, domain-specific accelerators; new tools or architectures to break the Von Neumann memory bottleneck; and packaging efficiencies are needed. She added that new model architectures, perhaps based on the mechanics of the human brain, which is irregular and domain-specific, could also enhance efficiency.

Model innovation is also needed to create smaller AI models that are carefully trained and fine-tuned, use sequence or state space, and can be reused or used in collaboration. The AI platform itself is the glue holding models and systems together, and she suggested that AI platforms could be made more efficient by better integrating heterogeneous accelerators to raise the level of abstraction; better matching models and accelerators; building in observability, transparency, trust, and community; and optimizing and scaling workloads.

___________________

⁸ Optical I/O uses light instead of electrical signals to transfer data.

⁹ D.A.B. Miller, 1990, “A New Principle of Wave Propagation: Huygens’ Principle Corrected After 300 Years,” in Optical Society of America Annual Meeting, Technical Digest Series, PDP16, https://doi.org/10.1364/OAM.1990.PDP16.

¹⁰ M. Wade, E. Anderson, S. Ardalan, et al., 2021, “An Error-Free 1 Tbps WDM Optical I/O Chiplet and Multi-Wavelength Multi-Port Laser,” in Optical Fiber Communication Conference (OFC), OSA Technical Digest F3C.6, Optica Publishing Group, https://doi.org/10.1364/OFC.2021.F3C.6.

Page 78 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

IBM researchers are working on these issues. One prototype accelerator, with fit-for-purpose integrated design and specific values and requirements, supports specializations such as fraud detection and geospatial analysis. A second, with an innovative, brain-inspired approach to memory, is extremely energy efficient but can currently only support smaller models.

Near- and Long-Term Impacts of Hardware–Software Co-Design

Taylor described opportunities to reduce AI data center energy demands through effective hardware–software co-design. A long-term vision for co-design includes synergy across the entire AI design stack, from materials and physics to devices, integration, architectures, algorithms, and applications.¹¹ While this process has been employed for adjacent AI layers, Taylor emphasized that an all-to-all and holistic codesign approach—including materials researchers, applications experts, and everyone in between collaborating on power, speed, and space—would have a large impact.

One co-design project, Threadwork, is a collaboration between Argonne, Northwestern University, and the University of Chicago that investigates optical interconnection applications in high-energy physics and neuromorphic materials and has developed devices with significantly reduced power needs.¹²

Hardware–software co-design can also have near-term impacts on efficiency. Hardware trends include innovating memory access to reduce data movement, using mixed- and low-precision computations, and considering biologically inspired hardware.¹³ Software trends, such as LLM inference benchmarks, explore different open source LLMs, AI accelerators, and inference frameworks for improved efficiency.¹⁴

___________________

¹¹ C. Murray, S. Guha, D. Reed, et al., 2018, Basic Research Needs for Microelectronics: Report of the Office of Science Workshop on Basic Research Needs for Microelectronics October 23–25, 2018, Technical Report, Department of Energy, Office of Science, https://doi.org/10.2172/1616249.

¹² Argonne National Laboratory, n.d., “Threadwork,” https://www.anl.gov/threadwork, accessed April 21, 2025.

¹³ N.P. Jouppi, D.H. Yoon, M. Ashcraft, M. Gottscho, T.B. Jablin, and G. Kurian, 2021, “Ten Lessons from Three Generations Shaped Google’s TPUv4i: Industrial Product,” Pp. 1–14 in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), https://doi.org/10.1109/ISCA52012.2021.00010.

¹⁴ K.T. Chitty-Venkata, S. Raskar, B. Kale, et al., 2024, “LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators,” arXiv:2411.00136, https://doi.org/10.48550/arXiv.2411.00136.

Page 79 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

Moving Data More Efficiently

Sze discussed opportunities to co-design AI hardware and software to improve the efficiency of data movement. Data movement dominates AI energy consumption, much more so than computing.¹⁵ It is possible to reduce the amount of data movement through computing-in-memory strategies or data reuse, where data is moved once to be used in multiple operations. It is also possible to reduce the energy that the data movement requires, by bringing chips or chiplets closer together, using 3D stacking to minimize the distance the data must travel, or by implementing optical interconnection or superconductors. However, these methods face a variety of challenges including manufacturing costs, yield, robustness, cooling, and scaling.

Co-designing models and hardware could also reduce the energy per computation and the amount of computation by reducing the precision or numeric representation, reducing model sizes through specialization or new model architecture, and exploiting sparsity, Sze said. Accomplishing these energy savings will often require hardware support and reducing the energy of the model needs to be balanced with maintaining accuracy. Considering these workload demands from an applications perspective can also allow one to exploit redundancies, enabling another form of sparsity.

More specialized hardware can improve efficiency, but Sze noted that this may come with trade-offs with regard to flexibility. It is often not preferred to hard-code AI models into the hardware because they have very different weights and architectures. In addition, while the hardware can be specialized in many ways, from instructions to dataflow and mapping, it is challenging to balance that specialization with future-proofing and unexpected limitations to innovation.

A Decade of Efficiency Gains

Dally highlighted some of the history he shared in his keynote presentation regarding drivers of increasing efficiency in AI systems over the past decade. GPUs are increasingly energy efficient thanks to innovations in number representation, complex instructions, and sparsity. Dally commented that the first two methods are largely mined out and said that the next generation of efficiencies will come from better leveraging sparsity. In addition, he noted that large gains are possible from hardware and software improvements and specialized models with improved attention.

___________________

¹⁵ M. Horowitz, 2014, “1.1 Computing’s Energy Problem (and What We Can Do About It),” Pp. 10–14 in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), https://doi.org/10.1109/ISSCC.2014.6757323.

Page 80 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

These innovations can help fuel the “AI gold rush” and enable society to fully leverage its benefits.

PANEL DISCUSSION

In an open discussion, panelists examined opportunities for hardware efficiencies and their adoption timelines, considerations around operational versus embodied carbon, data center needs, and future directions.

Hardware Efficiencies and Adoption Timelines

Wu asked how long it would take to start seeing efficiency gains from the technological solutions the panelists identified during their opening remarks. Dally replied that it depends on the length of testing and software adaptation, but generally it takes 1–2 years for a new GPU to make an impact. Popovic noted that optical interconnection has been possible for more than a decade, but it was not used or useful until it gained high-volume capabilities, and its manufacturing ecosystem still needs 2–3 years to develop for it to have a large-scale impact. Dally agreed, noting that many new technologies start this way—several breakthroughs are needed before an innovation reaches a stage where it can be easily built; is energy efficient; and has an obvious use, a ready audience, and no viable alternatives.

Eilam noted that some of the solutions she named were more long term, radical, and potentially very rewarding and should be pursued despite the risks or trade-offs. In addition, hardware–software co-design will provide support for software efficiency gains and architecture improvements and could identify different neural network archetypes that are more suitable for different use cases and architecture designs. In the nearer term, she said that many solutions are already in use, such as reduced precision, alternative architectures, modular chiplets, and 3D stacking. Taylor agreed, adding that gains from co-designing with materials scientists has long-term potential to greatly impact energy use and emissions output.

Wu asked when software engineers can exploit the efficiency potential from hardware innovations. Dally replied that they can do so immediately, because AI systems are poised to quickly integrate modifications and new features. Eilam suggested that software engineers are also more amenable to AI adaptations because the timing is right—it has a potentially enormous payoff, and the limits of scaling laws have been reached.

Noting that AI is not yet profitable, Popovic said that hardware efficiencies are needed to enable developers to layer applications on top of AI models to make them more economically viable. If that does not happen,

Page 81 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

he speculated that efficiency gains will be moot. Dally argued that that scenario was unlikely in light of current momentum toward monetizing AI.

A participant asked if there were memory or virtualization innovations in the pipeline. Dally replied that the question of how much memory to add to GPUs depends on a complicated balance of cost, usefulness, capacity, and access, and this calculation is re-optimized with every new generation. NVIDIA is considering putting memory directly on top of a GPU, bringing it closer, and further reducing energy costs, but that will also limit capacity and could potentially bring issues with power distribution and thermal regulation. Virtualization is less of a problem, however, as having 100 percent virtual memory in one space increases bandwidth, Dally said.

Eilam noted that IBM took the opposite path—their chip has only static RAM with no external memory, a fit-for-purpose design that has proven successful with smaller models. Taylor added that some scientific applications require significant memory and are moving toward surrogate models that change the ratio of computing to memory. “Memristors,” or memory resistors, could help navigate these different memory demands, she noted.

Ayse Coksun, Boston University, asked about power management. Dally agreed that there are opportunities to improve power management, many of which NVIDIA already uses. Their chips react quickly to power fluctuations by running at full speed, changing states to stay within a thermal envelope, and employing frequency-adaptive delay lock loops. NVIDIA researchers are also investigating more efficient power converters to react quickly to load variation. Taylor added that there are also opportunities across different AI accelerators, as each has its own power monitoring library with a distinct functionality, to create more uniformity of monitoring that could be reported to users.

A participant asked if it was possible to use AI to recode software in order to help free up data center capacity. Taylor replied that it is possible, noting that Argonne is working on using LLMs to translate its software, but researchers are encountering verification challenges, compiling errors, and, importantly, performance issues.

Reid Lifset, Yale University, asked about the potential for reuse of AI hardware. Dally replied that outdated hardware should be retired if the goal is to improve energy efficiency. While it could be resold, “it’s better to move on to the latest technology because it’s the most efficient,” he stated. Dally also confirmed that Jevons paradox, where use increases as efficiency rises and costs fall, is already happening in AI. Eilam suggested that scientists or innovative technology cannot solve Jevons paradox—that is a matter for government and policy—but they can focus on reducing the energy demands of computation.

Page 82 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

Operational Versus Embodied Carbon

While many of the innovations panelists described in their remarks are relevant to reducing the carbon generated through AI system operations, Wu asked panelists to comment on the balance of operational versus embodied carbon and what energy demands might be anticipated with manufacturing new hardware and accelerators. Dally suggested that operational carbon deserves more of the focus, as embodied carbon accounts for only a small portion of a GPU’s life-cycle carbon impact. Sze countered that both areas need to be considered. For example, it is possible that more specialized hardware will have a shorter lifespan as it might not be able to support new models and thus require more frequent replacement. It is important to find the right balance.

Eilam added that it is important to look at life-cycle trade-offs when weighing embodied carbon—embedded redundancies have fewer errors and extended lifetimes but can increase manufacturing costs or emissions. For example, IBM’s explorations into perfluoroalkyl substance replacements cannot be fully electrified, making it harder to reduce embodied emissions. Andrew Grimshaw, Lancium, noted that some vendors publish information about embodied and operational carbon in their product documentation.

Data Center Needs

Panelists then turned to considerations around data center siting and associated needs, drivers, and impacts. Grimshaw suggested that data centers should be located in places where renewable energy is cheap and power usage effectiveness (PUE) would be ideal, instead of clustering in places like Virginia. Dally noted that some developers already use this approach; for example, this is the reason why many cryptocurrency data centers were built in Washington State where they could use excess hydropower. “I think it’s just a great concept to be putting the computer where the power is cheap, because it’s easier to move the bits around than it is to move the watts around,” he noted. In his view, however, he said that PUE is not a good measure of efficiency because it requires GPUs to run at high temperature, which increases leakage current and hence power dissipation. Instead, he suggested using the ratio of total power versus total computing as measured in OPS, not watts.

Laura Gonzalez Guerrero, Clean Virginia, asked what data centers can do today to avoid increasing fossil fuel use. Dally suggested moving data centers to west Texas where they can access cheap energy. Wu pointed out that there are other considerations, and a participant listed the cost of land and equipment, permitting ease, and latency concerns as key

Page 83 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.

additional considerations, although given how fast signals travel today, latency is perhaps less of an issue. Eilam agreed that data center location requires careful consideration, especially with regard to renewable energy availability and grid reliability.

Future Directions

To close the session, Wu asked each panelist to identify an important area of research and investment. Eilam expressed excitement about the future—and current—value of AI and suggested it could greatly accelerate scientific discovery—for example, by solving hard problems in medicine and materials science. She identified a need for more investment in hardware–software co-design, especially algorithm innovation in neural network architecture. AI also has the potential to accelerate renewable energy production and use if computing can fully leverage its flexibility.

Taylor expressed agreement regarding the importance of co-design and reiterated the need for a holistic examination and optimization of the full AI stack, from materials to applications. Dally named two areas: exploiting sparsity of activations in hardware design and, in software, replacing transformers’ reliance on quadratic complexity with a more efficient approach. Popovic highlighted the need for more investment in education. Sze echoed the importance of holistic co-design and suggested that different research communities, which are currently siloed, should work together to create a shared language for full system optimization. She also noted that it is important to understand how to use AI selectively, when the job or costs make it worthwhile.

Page 84 Cite

Suggested Citation: "8 Efficiency Through Technology Advancement: Hardware–Software Interactions." National Academies of Sciences, Engineering, and Medicine. 2025. Implications of Artificial Intelligence–Related Data Center Electricity Use and Emissions: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/29101.