Facebook makes ‘mea-culpa’ for the global blackout: the mistake was our own engineering | Companies

The outage was “caused not by malicious activity, but by our mistake,” Santosh Janardhan, vice president of infrastructure at Facebook Inc., said in a blog post on the company’s blog.

The interruption, which lasted seven hours, began around 12:40 pm (Brasilia time), when Facebook engineers were trying to do routine maintenance at one of the company’s data centers, said Janardhan.

Read more: In the US Senate, whistleblower says lack of transparency makes it impossible to regulate Facebook

Seeking a read on Facebook’s networking capability, engineers issued a network command that inadvertently cut all Facebook data centers from the company’s network. This led to a cascading failure that took Facebook properties off the internet.

Seeing that the data centers were offline, servers that used the domain name system (DNS) to direct internet traffic went down.

DNS is what browsers and cell phones use to find Facebook’s services on the internet, and without it it would be “impossible for the rest of the internet to find our servers,” Janardhan said.

Read more: FT: Facebook cheats investors about ratings, says former SEC official

The DNS changes also disabled internal tools that would have allowed Facebook engineers to remotely restore the service, forcing Facebook’s engineering team to drive to the data centers in person and reboot systems from there.

This took longer. “It’s hard to get in, and once inside, the hardware and routers are designed to be difficult to modify, even when you have physical access to them,” Janardhan said. “So it took longer to activate the secure access protocols needed to get people on site and be able to work on servers,” he explained.

Facebook; Facebook; social network; cell phone — Photo: Solen Feyissa/Unsplash