NVIDIA introduced the key tool to the data center that will further drive their quest: an enhanced dashboard representing real-world application progress.
The amount of work done is divided by the energy used. Applying that to data centers, and it’s a bit more granular. The de facto standard, power usage effectiveness, gauges the total facility energy consumption against that of its computing infrastructure. For the last 17 years, Power Usage Effectiveness (PUE) has driven the best operators to near-optimal, squeezing energy waste in things such as power conversion and cooling down to a minimum.
Also Read: What are the Top Skills Needed for Digital Transformation?
Redefining Data Center Metrics for the AI Era
Addressing the Shortcomings
While PUE has been the stalwart metric for data center efficiency, its relevance today is at stake in the face of generative AI. The dynamic nature of workloads in modern data center environments, especially when occupied with AI, requires a more nuanced approach.
The main drawback of PUE is that it cannot judge the output of a data center but can only judge the energy consumption. This omission is like measuring the fuel consumption of an engine without regard to distance traveled, a critical omission when it comes to efficiency.
Identifying these limitations, the industry now sees a proliferation of alternative standards, as highlighted in a comprehensive 2017 paper. These diverse metrics target specific facets of data center operations, including cooling, water usage, security, and cost.
As data centers negotiate the complexities thrown up by the AI era, metrics will have to be redefined to capture efficiency and performance metrics in a manner that goes beyond PUE.
Understanding Watts and Reimagining the Energy Metrics
The computer industry has long used watts as a standard for describing systems and processors regarding energy efficiency. While watts are informative regarding a system’s input power at any given moment, they fail to capture the whole picture when trying to measure a computer’s full energy consumption and efficiency.
There is a false belief that increased input power in watts is synonymous with reduced energy efficiency. New systems and processors are often more efficient in terms of work done about the energy utilized.
The metrics for figuring out energy efficiency in data centers must focus on energy consumption—kilowatt-hours or joules—to determine the real efficiency. The core is determining how well the data centers convert the energy to useful work; it goes beyond these simple watt measurements.
Performance Metrics Beyond MIPS and FLOPS
In the tech industry, metrics like MIPS (millions of instructions per second) and FLOPS (floating point operations per second) have long been the gold standard for measuring computational might. However, these abstract measures do not reflect real-world performance.
While computer scientists may care about lower-level tasks, end-users care more about tangible results than technicalities. “Useful work” is subjective, which calls for a paradigm shift in gauging performance.
Domain-specific benchmarks have become the norm in specialized applications, from AI-focused data centers to scientific supercomputing facilities. They reflect the essence of real-world applications and are symptomatic of the changing times in technology and user needs.
Performance metrics must be updated with new use cases brought by rising technology. The recent incorporation of tests for generative AI models in MLPerf reflects this change well, with the commitment to relevance and innovation in the best practices in benchmarking.
Also Read: How Security Orchestration Automation and Response (SOAR) Streamlines Incident Response?
Experts Insights
With today’s data centers achieving scores around 1.2 PUE, the metric “has run its course,” said Christian Belady, a data center engineer with the original PUE idea. “It improved data center efficiency when things were bad, but two decades later, they’re better, and we need to focus on other metrics more relevant to today’s problems.”
Looking forward, “the holy grail is a performance metric. You can’t compare different workloads directly, but if you segment by workloads, I think there is a better likelihood for success,” said Belady, who continues to work on initiatives driving data center sustainability.
Jonathan Koomey, a researcher and author on computer efficiency and sustainability, agreed.
“To make good decisions about efficiency, data center operators need a suite of benchmarks that measure the energy implications of today’s most widely used AI workloads,” said Koomey.
“Tokens per joule is a great example of what one element of such a suite might be,” Koomey added. “Companies will need to engage in open discussions, share information on the nuances of their own workloads and experiments, and agree to realistic test procedures to ensure these metrics accurately characterize energy use for hardware running real-world applications.”
FAQs
1. Why do data centers need an upgraded dashboard for energy efficiency?
Data centers require a comprehensive tool to monitor and enhance their energy efficiency journey, one that showcases progress with real-world applications.
2. What is Power Usage Effectiveness (PUE), and why is it important?
PUE is a metric widely used to compare a data center’s total energy consumption to the energy used by its computing infrastructure. Over the years, it has driven efficient operations by minimizing energy wastage.
3. Why is PUE considered insufficient in the current AI era?
PUE solely measures energy consumption without accounting for a data center’s useful output, making it inadequate for assessing efficiency in the era of generative AI.
4 What alternatives to PUE exist for measuring data center efficiency?
Various standards exist, focusing on specific efficiency targets such as cooling, water use, security, and cost, providing a more holistic view of data center operations.
5. How does accelerated computing contribute to energy efficiency in data centers?
Accelerated computing, powered by GPUs, offers significant gains in energy efficiency by executing tasks faster and more efficiently than traditional CPUs, leading to substantial energy savings across various industries.
[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]