بحث
Close this search box.

شارك

شارك

شارك

Lessons to be Learned From the Recent Dropbox Outage

Very few of us in the web hosting industry will ever have the need to scale to the level that services like Dropbox do. With that said, when a service the size of Dropbox makes a misstep that leads to an outage, it is worth paying attention to the causes and impact to see if there are any potential lessons to be learned.

On January 10, Dropbox went offline. Users weren’t able to sync their folders, and thus they couldn’t access their files on many devices. The service was down for much of Friday evening, and users had trouble accessing their files throughout the weekend.

Of course, the media was full of speculation about potential causes for the outage, with many focusing on a possible DDoS attack. On the following Monday, Dropbox released a statement that went into detail about the causes of the outage, which dismissed the idea of an attack by hackers and instead blamed a faulty update process.

On the day of the outage, Dropbox was running a scheduled OS backup. As you can imagine, updating the thousands of servers that Dropbox uses is in no way an easy task. Much of the process is automated with scripts, which may have been the cause of the downtime.

The key lesson here, as detailed by Head of Infrastructure at Dropbox, Akhil Gupta, is that if you are going to do an upgrade, you need to be absolutely certain what state the server is in. To prevent the same mistake from happening again, Dropbox implemented an extra level of checks, so that the server will verify its own state before carrying out commands, rather than blindly executing incoming instructions regardless of what it is doing when it receives them.

It is not mentioned in the post-mortem of the incident, but the outage could probably have been avoided with more rigorous testing. The Dropbox outage is reminder of what may happen when a business is rapidly scaling their infrastructure. Scaling becomes the primary goal, and testing falls by the wayside to some degree.

A more rigorous approach to testing and verification of automation scripts may have caught the “subtle bug” before it wreaked havoc.

About Graeme Caldwell — Graeme works as an inbound marketer for InterWorx, a revolutionary web hosting control panel for hosts who need scalability and reliability. Follow InterWorx on Twitter at @interworx, Like them on موقع التواصل الاجتماعي الفيسبوك and check out their blog, http://www.interworx.com/community.

 


الوظائف الموصى بها

تقنية LIFT

أهم 7 اتجاهات لمراكز البيانات لعام 2024

تلعب مراكز البيانات دورًا حاسمًا في السماح للمؤسسات بمعالجة البيانات الحيوية للمهام والوصول إليها وتخزينها لعملياتها اليومية. كما يرى العالم

أدخل المعلومات أدناه لتنزيل المستند التقني

دليل ترحيل مركز البيانات

أدخل المعلومات أدناه لتنزيل المستند التقني

دليل سلامة مركز البيانات

أدخل المعلومات أدناه لتنزيل المستند التقني

أفضل الممارسات لنقل قسم تكنولوجيا المعلومات في مركز البيانات

أدخل المعلومات أدناه لتنزيل المستند التقني

أفضل الممارسات للتعامل مع معدات مركز البيانات

أدخل المعلومات أدناه لتنزيل المستند التقني

الكتاب الأبيض خطة عمل دمج مركز البيانات

أدخل المعلومات أدناه لتنزيل المستند التقني

شراء جهاز رفع مركز البيانات