Facebook LogoLinkedIn LogoTwitter X LogoYouTube Logo
Contact Us Join 7x24 Exchange

The leading knowledge exchange for Data Center, IT and Mission Critical professionals.

Addressing Key Data Center Challenges with Artificial Intelligence for Autonomous Cooling Optimization

7x24 Exchange 2022 Fall Magazine | Addressing Key Data Center Challenges with Artificial Intelligence for Autonomous Cooling Optimization

By Linda Stern

As demand for data usage is increasing at an unprecedented rate, CEOs and CIOs are focusing on how to digitally transform operations to keep up with demand. Artificial Intelligence (AI) will have a tremendous impact on data center management, productivity, and infrastructure.

One area where AI can immediately deliver real benefits is data center cooling and control. As demand for data grows, so does the need to better manage cooling conditions in data centers.

Digital transformation, 5G mobile networks, more and more connected devices, and a shift to more remote working have transformed data centers into critical lifelines for any organization, while introducing new challenges for data centers. Reliability is more critical than ever. Heat loads are rising and shifting. Infrastructure is being tested. Staff is stretched thin. Data centers must operate more efficiently to meet goals for carbon reduction and sustainability. Another challenge is determining how to scale to meet current demand and plan for capacity growth.

To address these challenges, autonomous cooling management is delivering significant benefits for many data center environments. And AI-based thermal optimization solutions are already proving valuable for autonomous operations with proven ROI.

The strongest benefits from advanced solutions incorporating Artificial Intelligence (AI) enable data centers to continuously predict heat loads in real time and quickly respond to demand, applying the right amount of cooling when and where it is needed among racks. This is eliminating the need for on-site staff to manage cooling, while optimizing performance, reducing risk, and increasing reliability, efficiency, and sustainability of data centers.

From hyperscale and cloud-based, to enterprise data centers serving organizations of all types, autonomous cooling management supports ongoing operations with minimal staff by responding in real-time to maintain optimal temperatures at equipment levels (where it matters most), reacting appropriately to sudden or erratic shifts in demand, reducing the need for ad hoc cooling measures (like moving floor tiles), increasing energy efficiency by avoiding over-cooling, and providing insightful information for better cooling capacity management.

Among factors that make cooling controls challenging is the mix of legacy equipment from various manufacturers found in many data centers, each with its own recommended standards that often call for more cooling than needed. Managing airflow is also complex, especially with the dynamic nature of data centers and unpredictable demands of IT equipment.

Additionally, on average, 20% to 30% of servers in large data centers are unused or obsolete but still consume electricity. These ghost or zombie servers add to energy costs and create excess heat that drives higher demand for cooling. As a result, most data centers are overcooled, wasting valuable resources.

Thermal optimization is needed to maintain optimal temperatures, as well as optimize energy use and personnel time to better manage resources. The goal is to optimize primary and secondary cooling cycles and temperatures in the unusable “white space”. Data is collected and analyzed, and the performance of everything from the cooling tower to the rack is measured and approved, applying intelligence about the density of IT workloads to regulate cooling in a data center. This can apply to both chilled water- and air-cooling systems.

One approach focuses on chiller plants. Instead of pumping a constant volume of water through the chilled water system without regard for the varying load on the system, the goal is to control water flows and equipment to adapt to load changes. For example, Siemens’ Demand Flow solution focuses on optimizing every subsystem of a chiller plant, collecting, and analyzing data to deliver the proper amount of chilled water to meet current cooling loads. Through variable-speed drives, high-accuracy
precision instrumentation and complex logic, Demand Flow dynamically adapts to load changes, maintaining optimal system performance. Optimizing water flow through chillers in near-real time produces significant energy savings and can increase the nominal tonnage of a typical plant by as much as 20%.

Thermal optimization focused on air flows and related equipment has also proven to be quite valuable. Simple solutions analyze data from air handling units and cooling equipment to provide reports and alerts for staff to evaluate. Other solutions include predictions and recommendations for staff to take actions. More advanced systems incorporating Artificial Intelligence (AI) offer significant benefits, continuously analyzing and learning from collected data to determine optimal settings.

Taking this further, AI-based autonomous systems can automatically implement air flow and temperature controls in real time, continuously optimizing cooling without requiring manual efforts from data center staff. For example, Siemens’ White Space Cooling Optimization (WSCO) solution is an autonomous system that uses AI and complex algorithms in real-time to continuously improve optimizations, predictions, and reactions to various scenarios, optimizing temperature control and cool air distribution to match IT load and implementing equipment controls.

White Space Cooling Optimization (WSCO) uses a dense network of sensors, cooling unit controls, and an AI engine to dynamically match facility cooling with real-time IT load. Sensors measure temperature where it matters most, at IT equipment air inlets, and send temperature data to an AI engine, stored virtually or on on-site hardware. The AI engine maintains a real-time model of airflow throughout the facility and the impact of cooling on each sensor. Using predictive control algorithms to continuously optimize cooling, it measures heat load and cooling equipment efficiency, determines airflow influence and the best combination of cooling to ensure desired temperatures at each sensor, and sends commands to cooling units. The system works autonomously and in real time, with the AI engine continuously learning effects of control actions. As demand on servers increases, it responds automatically, dynamically adjusting the number of units and airflow to match load requirements. Data center staff can manage the system and access performance and facility data through any building management system (BMS).

Siemens helps BMO Financial Group beat the heat with White Space Cooling Optimization
Siemens Canada and BMO’s Critical Environments group collaborated to identify efficiencies in energy consumption outside of BMO’s core IT operations. The two teams worked together to assess the site, model cost savings, then develop and implement the solution. This approach supported a reduction in energy consumption of 55% in the designated area of the data centre. This represents a significant reduction in operational carbon footprint.

The deployment of Siemens WSCO uses Machine Learning to dynamically monitor IT rack inlet temperatures and adjust Computer Room Air Conditioners (CRACs) to meet target temperatures with the lowest possible energy consumption. Using a network of sensors, WSCO collects temperature and air supply data in the Scarborough facility. Using AI, the system first learned how each CRAC influenced cooling operations, then applied the data to algorithms, adjusting CRAC airflow to maintain the correct temperature for each zone. The system went beyond alerting human operators to temperature fluctuations by autonomously adjusting.

Cooling automation reduces the risk of a thermal outage and ensures consistent air temperatures among server racks and white space is maintained. Through energy optimization, air supply is matched to the demand in real time, automatically responding to temperature changes and eliminating over-cooling and re-circulation. At the same time, BMO’s Critical Environments Group collected actionable data to support future operational business decisions. WSCO was deployed in two data rooms at the SCC Data Centre which houses racks of critical IT loads and utilizes multiple CRACs running 24 hours a day, seven days a week. After a learning period, the WSCO system was able to reduce the number of operating units by 64% for constant speed fans and lower the speed of variable speed units, delivering a total reduction in CRAC fan energy

In addition to these savings, the system was able to improve thermal control within the space. Siemens helped BMO’s Critical Environments Group secure utility rebates, resulting in a payback time of less than two years.

“Siemens and BMO’s Critical Environments Group advanced our sustainability objectives while improving our operational effectiveness—a true Win-Win.” – Eugene Murariu, Managing Director, Global Engineering, Corporate Real Estate, BMO Financial Group

To increase staff efficiency, AI-based autonomous cooling optimization systems minimize the need for staff on site, allowing valuable resources to be assigned to other critical tasks and limiting overall site access. While matching cooling requirements to IT load in real-time, they can self-correct and quickly scale to meet demand, eliminating the need for human intervention to physically address the tactical, complex, and time-consuming task of adjusting cooling systems.

For improved planning and budgeting, these systems should easily integrate with any building management system (BMS) and give managers access to real-time and trend-based insights, with new levels of control for tasks such as capacity planning—to identify how much cooling is available and when and where to deploy new IT equipment—or to defer or reduce capital expenditures since fewer cooling units are needed and cycles on existing equipment can be reduced, extending equipment life. Identification of poorly performing equipment helps managers plan when equipment needs to be retired. And identification of areas facing thermal risk helps ensure that high-value IT assets are not deployed there until problems are resolved.

Besides addressing today’s data center challenges—personnel management, operating costs, operational efficiency, and energy efficiency—autonomous thermal cooling solutions need to prepare operations to adopt other new AI applications and to scale for broad deployments, as AI offers promising solutions to improve operations over the long term. Many data centers are already successfully implementing AI with new thermal cooling solutions. One of the challenges is replicating and scaling AI pilots for enterprise-wide deployments. Most companies want to see how quickly AI can provide a return on investment.

As the case study shows, successful implementations of White Space Cooling Optimization are proving that truly autonomous cooling with Artificial Intelligence and an agnostic infrastructure design can be cost-effective and provide many benefits, optimizing today’s environments as well as ensuring a responsible and scalable solution to meet future demands.

Using a network of sensors, WSCO collects temperature and air supply data. Its AI Engine applies the data to algorithms and calculates the required adjustments in airflow to maintain the correct temperature for each aisle of racks. It goes beyond alerting human operators to temperature fluctuations and automatically makes adjustments itself. By automating the control of cooling fans, it reduces the risk of a thermal outage and maintains consistent air temperatures among server racks and white space. It also reduces wasted energy by dynamically matching cooling in the IT load in real time, automatically responding to temperature changes and eliminating over-cooling. At the same time, it gives the management team critical data that supports future operational business decisions.

The backbone of WSCO consists of a wireless mesh system with a dense array of sensors and controllers that fuel the powerful analytics and machine learning process. Rack-level sensors capture both top and bottom rack temperatures while cooling unit controllers provide data such as supply and return temperatures, status, speed, and cooling unit power. This data helps identify cooling influences on the racks so that WSCO’s learning software can dynamically determine how much cooling is needed for ideal conditions.

Leveraging data collected by the sensor network, WSCO’s AI engine automatically creates a real-time model of the facility’s thermal environment. The AI engine maps influences and determines the precise cooling influence of every Computer Room Air Handling (CRAH) unit, both individually and collectively, at every spot across the data center. In most facilities, the AI engine is able to identify influence patterns and measure cooling influences in less than 24 hours.

The WSCO system then takes dynamic control of the cooling units—turning them on and off, and ramping fan speeds up and down—to meet pre-specified temperature settings in the most efficient manner possible. As the AI software learns the effects of control actions, it manipulates the cooling equipment by itself without staff intervention, automatically managing cooling and balancing airflow in critical areas in the data center hall.

WSCO solutions are infrastructure agnostic, so they are immediately compatible with existing systems with minimal on-site configuration and setup. The WSCO wireless architecture makes installation non-intrusive and flexible. Without the need to run hundreds of cables, even large sites can be up and running in a matter of weeks.

WSCO’s key success factor is its ability to dynamically match cooling to the IT load in real time. The AI engine uses real-time data to produce algorithms that predict the best level of cooling that will deliver the desired temperature at each sensor. And because most facilities are over-cooled, WSCO uncovers pre-existing redundancy and redirects airflow. It typically level-sets cooling capacity while delivering significant reductions in energy use.

WSCO’s impact on air cooling efficiency

Linda Stern is Marketing Manager, Data Centers at Siemens Smart Infrastructure. She can be reached at Linda.Stern@Siemens.com.

Read Other Articles