HP Enterprise has partnered with the National Renewable Energy Laboratory (NREL), a unit of the Department of Energy, to create AI and machine learning-systems for greater data-center energy efficiency.
The Department of Energy lab will provide HPE with multiple years’ worth of historical data from sensors within its supercomputers and in its Energy Systems Integration Facility (ESIF) High-Performance Computing (HPC) Data Center, one of the world’s most efficient data centers. This information will help other organizations to optimize their own operations, said NREL.
The project, dubbed “AI Ops R&D collaboration,” is expected to run over three years. Already NREL has 16 terabytes of data from the ESIF data center, collected from sensors in NREL’s supercomputers, Peregrine and Eagle, and its facility. It will use that data to train models for anomaly detection to predict and prevent issues before they occur.
They aren’t kidding about the energy efficiency of the data center, either. The ESIF had an average power use effectiveness (PUE) of just 1.036, easily the lowest I have ever seen. Up to now the best I’ve seen is 1.15. This means ESIF currently captures 97 percent of the waste heat from its supercomputers which it uses to warm nearby office and lab space.
HPE and NREL say that early results based on models trained with historical data were able to predict or identify events that had previously occurred in ESIF. The focus is on monitoring energy usage to optimize energy efficiency and sustainability as measured by key metrics such as PUE, water usage effectiveness (WUE), and carbon usage effectiveness (CUE).
The project will have four key areas:
- Monitoring: Collect, process and analyze vast volumes of IT and facility telemetry from disparate sources before applying algorithms to data in real-time.
- Analytics: Big-data analytics and machine learning will be used to analyze data from various tools and devices spanning the data-center facility.
- Control : Algorithms will be applied to enable machines to solve issues autonomously as well as intelligently automate repetitive tasks and perform predictive maintenance on both the IT and the data-center facility.
- Data-center operations : AI Ops will evolve to become a validation tool for continuous integration (CI) and continuous deployment (CD) for core IT functions that span the modern data-center.
HPE hopes that the software developed from the project will be able to provide not just predictive analytics, but services in other key areas. It plans to offer a monitoring stack to collect the data so it can be analyzed in real-time. It also wants to integrate its findings with their HPE High Performance Cluster Management (HPCM) system to provide complete provisioning, management, and monitoring for clusters scaling to 100,000 nodes.
Now see10 of the world’s fastest supercomputers