By Adil Atlassy
Let’s start with a brief reality check. Historically (and ironically), the management of data centers has happened largely without the use of data – despite the disparate services and pieces of infrastructure that regularly collect considerable amounts of data. Maintenance, for example, is still all too often performed on an ad hoc basis and at the equipment level rather than managed remotely and informed by performance and health metrics.
Colocation providers have access to an almost overwhelming volume of data from which significant actionable insights and competitive advantage can be gained. Data center providers grappling with the trend towards commoditization, for example, are seeking ways to add value for customers through services and capabilities. Beyond market differentiation, providing human interaction with the overall system to predict equipment failures before they occur offers practical value by dramatically reducing service interrupting outages due to component failure.
Predictive and prognostic analytics offer a solution to these areas of friction and present a key opportunity to carve out greater efficiency in design and operations, as well as action on insights from the data to drive down service and maintenance costs.
WHY PREDICTIVE ANALYTICS?
In short, because predictive analytics are poised to solve real customer challenges (and the market agrees: predictive analytics is from $4.6 billion USD in 2017 to $12.4 billion USD by 2022).
Cost overruns (spanning both capital and operating expenses) specifically are often a top concern – whether from over-purchasing infrastructure capacity that’s at odds with your current needs, investing in redundancy strategies or simply funding routine operational maintenance. Analytics have the potential to substantially reduce upfront costs and longer term investment by providing visibility into the scope of current workloads and capacity (and outlining the potential future scale that will need to be supported) as well as aiding in the forecasting of when data center infrastructure will begin to degrade and truly need to be replaced (avoiding purchases too early in the lifecycle).
Beyond contributing to reduced capex spend and long-term investments, analytics can address a growing concern among data center providers and data center operators as infrastructure continues to age: providing visibility and improving asset performance for higher uptime and longer meantime between failure.
You might be thinking, “infrastructure is typically architected to be redundant as a means to avoid issues with aging.” But that’s precisely the problem – and the iceberg we’re heading towards, so to speak. While data centers do involve a large footprint of typically heavily redundant equipment — sometimes triple – redundancy means it never really fails. And when something never fails, how can you begin to predict when it might – and what to do if it does?
Consider this scenario. Let’s say a plant manager is running a pump in an industrial setting. The pump starts to get hot, then hotter, and then it starts to vibrate – and suddenly it fails. Yes, now there’s a problem that needs solving, but the manager also now has information about how that pump might fail in the future. In redundant systems, failure is never seen and thus is difficult to prepare for and prevent over time.
Similarly, as mentioned earlier, some systems eventually reach triple redundancy in an effort not to fail. Beyond blinding managers and operators to how an asset is truly performing underneath the layers of redundancy, the capital expense of redundancy adds up when you add ongoing maintenance costs on top. As the business of data centers ages and continues to expand at the same time, redundancy will become an issue that only analytics can address.
Ultimately, risk will be lower, and the lifecycle optimized when applying data driven asset management, i.e. predictive analytics, and ultimately contribute to the decrease in failures and interventions too.
Many have tried this before, but getting to predictive requires both data infrastructure and a systemic approach. The tools are now available to leverage the extensive operational and asset domain expertise to annotate those failure modes and make prognostic responses. From there, we can begin to progress towards more complex models like machine learning and AI (It’s worth noting that despite industry hype and expectations, the AI conversation today tends to get carried away. Given the lack of available data, AI in data centers doesn’t really exist at this stage).
Data center managers and operators must work on the basics first to then deliver advanced analytics.
STARTING DOWN THE PATH
The challenge inherent in all of this lies in understanding just how to deploy the mountain of available data to improve operations, reduce cost, and simplify services. Modern data centers are complex systems but often viewed and managed as individual assets. Certainly, analytics can still be generated on these siloed assets, but individual analytics don’t show cause and effect. For example, how does adjusting the ambient temperature impact the performance of the electrical infrastructure?
Without this type of correlated insight into causation, it’s incredibly difficult for today’s data center managers to know the cascading effect of any failure across the data center as a whole – unless the entire system is considered. Collective data, from as many data points as possible, will drive predictability and enable accumulation of data to build rules-based models.
Establishing the data infrastructure that will support data driven asset management begins with the cloud. Then comes instrumenting and ensuring the telemetry is in place for the data center to aggregate as much data as possible. Essentially, the result will be a registry of all the assets in one place.
As we talk about collecting and processing more and more data, it’s important to pause and consider the role of security and the extent to which it can become an obstacle. Many data center infrastructure vendors go through rigorous cyber testing at the equipment level, so we would argue that a larger focus for building a secure data infrastructure should be around people and processes, instead – because most vulnerability falls in these areas.
PUTTING DATA INTO PRACTICE
So, how to begin putting data into practice? Do managers start with one domain at a time (electrical, mechanical, IT) or should it be approached as a full system strategy? Your journey towards predictive analytics should all start with an FMEA (failure mode and effects analysis) to expose the areas that are critical to failure.
Think key inflection points like the batteries, the UPS switchgear, the mechanical elements and so on. Allow these areas – places where operators can benefit from insight into the cause of performance degradation – to drive the focus for the integration of analytics.
It’s also critical to create a consistent asset model (all your asset information in one place!). In many data centers today, systems are viewed and managed in siloes. The HVAC system, for example, is likely handled via a building management system. Likewise, the battery system is fed through an IT system while the switchgear may appear as part of the electrical system. That’s why a consistent asset model is so important. For predictive & prognostic system level analytics to be truly impactful, all these disparate systems should be unified. Interestingly, this is a methodology that’s been used successfully in aerospace for years. The practice looks at every component within the system and analyzes what impact the component would have in a particular failure mode within the system, what impact that failure may have on another component within the system, and what the effect would be on the overall system.
The time is now to begin taking a data driven asset management approach. Between commoditization of data center offerings and the need to add distinguishable value for customers, many providers are heading towards a key moment in time, and the market is ripe with opportunity for the integration of predictive analytics. With redundancy creating a false sense of security – not to mention, performance degradation that’s passed on to customers – the ability to leverage analytics to reduce upfront costs, protect the longevity of existing investments and ultimately contribute to the decrease in failures and interventions will be key in the near term.
You can see how Schneider Electric and Compass Datacenters are working together to bring that solution to life.
Adil Attlassy is Chief Technical Officer for Compass Datacenters. He can be reached at firstname.lastname@example.org. Wendi Runyon is Vice President Strategy & Business Development for Schneider Electric. She can be reached at email@example.com.