Data center managers and operators know the looming threat of unnecessary downtime well. Despite technological advances in the IT infrastructure space, they are a common phenomenon. The Ponemon Institute placed the cost of a minute of downtime in the data center at $5,600 in 2010 and at $7,900 in 2013. That has now risen to an average of over $8,000 per minute. For the Fortune 100, the loss of unplanned downtime costs as much as $2.5 billion annually.
At ServerLIFT, we partner with data center industry leaders all over the world, of all different sizes and types. We are familiar with the seriousness of downtime and its potential ripple effect throughout your organization. It can impact everything from the safety of your workers to your budget, policies, compliance efforts, and beyond.
Entire business models are built upon the service and function of a data center—and for modern businesses, consistent data center uptime can mean the difference between success and failure.
As we put together this top 10 list for avoiding downtime in the data center, it was clear that so many of these incidents are entirely preventable with the right preparation and awareness.
Use the hashtag #datacentersafety to continue the conversation on social media and weigh in.
#1: Update Legacy Hardware
We recommend keeping a comprehensive and routinely updated list of all available hardware/software warranties and maintenance schedules. This prevents potential usage of any hardware that is no longer supported by a manufacturer or is in dire need of an upgrade. While this can seem like a costly endeavor, it is far less expensive than the alternative.
#2: Comply with Industry, Regional, and National Standards
Making the effort to comply with any and all standards indicates an organization focused on safety and reliability. These standards are maintained by data center staff and monitored via internal and independent external audit. (As a side note, ServerLIFT complies with several global certifications and standards in order to serve clients in any region or industry.)
#3: Establish and Maintain a Backup Fuel Source
While fuel and energy sources tend to vary by region, all can be impacted by climate, architecture, capacity, and maintenance failures. Establishing and maintaining a backup fuel source is a key step toward eliminating unplanned downtime.
#4: Equipment Service Checks
In the same vein, equipment service checks are required for all redundant and backup systems. Backup generator systems are just one example. We recommend the creation of a preventative maintenance and testing schedule for all relevant infrastructure. We provide onsite maintenance and repair services for the same reason.
And don’t stop there—make sure that all equipment service checks are properly documented and stored for future reference to ensure that your maintenance processes are both dependable and scalable.
#5: Upskill Staff
Your staff is already trained on their day-to-day responsibilities—but do they have a set of best practices in place to avoid downtime? Have they been given the skills and tools needed to optimize efficiency, respond in a timely fashion, and protect equipment during an outage?
Education is a preventative measure with considerable impact. Most of your worst-case scenarios have probably already occurred in another data center, to a business partner, etc. Learn from your peers and use their experiences to upskill your own staff. Training and certification programs are key, especially when it comes to preventing outages due to human error.
#6: Refine Data Center Procedures
Document and standardize existing data center procedures, and then implement periodic review and retraining on those procedures. Run mock drills with your team so that when an incident occurs, “muscle memory” takes over and response time to the incident is minimized.
#7: Evaluate Scheduling
Evaluate the way that you structure your staff and how many operators you have on a shift. Make sure each operator receives adequate time off and is getting appropriate amounts of rest. As a data center manager, you may not be able to eliminate human error—but you can certainly mitigate the contributing factors, such as fatigue and stress due to improper staffing.
Budget pressures may make it difficult to adjust schedules and consider changes. This can be addressed, however, in a number of different ways. We put together some budgeting advice here.
#8: Look to the Future
Make your data center operations scalable and future-proof. This means ensuring all current technologies are being leveraged properly and purchases are considered carefully. Room for growth must be assessed pragmatically and accounted for in your budget. This will allow future capacity needs to be met.
#9: Continuous Improvement of Cybersecurity
This is a daily task (not set it and forget it) . . . keeping track of the latest malware and ransomware threats. Hackers share data with each other on security penetration, which means your team must do the same. Encryption, analytics for identification of suspicious patterns, and adherence to new privacy regulations can all make your data safer (and less prone to downtime in the process).
#10: Prepare for Natural Disasters
While your data center may not sit in a hot zone for earthquakes or tsunamis, even a major storm can do damage. Establish automatic resilience testing to reduce costly outages. Once emergency procedures are vetted and put into place, staff can be trained and certified.
Data center downtime does not have to be an inevitable risk. By building in regular safety checks, putting the right systems into place, and guiding employees through potential scenarios, a business can eliminate many—if not all—of the common risk factors. Our white paper The Data Center Safety Guidebook can also help!
If there is another item you would add to this list, a final reminder to share this article and use the hashtag #datacentersafety to join the discussion. Join the Data Center Safety Group on LinkedIn to keep up with the latest trends.