joe tidy

Cybersecurity Reporter, BBC News

12 october 2021 Updated 5 hours ago

Credit, Getty Images Photo caption, Facebook founder and CEO Mark Zuckerberg apologized for the company’s latest service interruption

I doubt that Mark Zuckerberg reads the comments people leave on his Facebook posts.

But if he did, it would take him approximately 145 days, without sleep, to navigate the barrage of comments left for him after he apologized for the collapse of services last week.

“Sorry for today’s interruption,” Facebook founder and CEO posted after Facebook, WhatsApp and Instagram went offline for nearly six hours.

Facebook blamed routine maintenance work for the outage — its engineers issued a command that unwittingly disconnected Facebook’s data centers from the entire internet.

About 827,000 people responded to Zuckerberg’s apology.

Reactions were mixed. Some were humorous: “It was terrible, I had to talk to my family”, commented an Italian user; others confused: “I took my phone to the repair shop thinking it was broken,” wrote someone from Namibia.

And, of course, there were also those who were very upset and angry: “You can’t wrap everything up at the same time. The impact is unprecedented,” a Nigerian businessman posted. Another Indian claimed compensation for the interruption of his business.

What is clear now, if it wasn’t already obvious, is how billions of people have become dependent on these services — not just for fun, but also for essential business and communication.

What is also clear is that this is far from a one-off situation: experts suggest that widespread interruptions are becoming more frequent and disruptive.

“One of the things we’ve seen in recent years is an increasing reliance on a small number of networks and companies to deliver large amounts of Internet content,” says Luke Deryckx, technical director of Down Detector, an online platform that provides users with information Real-time status of various websites and services.

“When one, or more than one, has a problem, it affects not just them, but hundreds of thousands of other services,” he says.

Facebook, for example, is now used to tap into a variety of different services and devices, such as smart TVs.

“And then we ended up having these episodes,” says Deryckx. “Something is happening [e] we all look at each other like ‘well what are we going to do?'”

Credit, Getty Images Photo caption, Many companies now rely heavily on Facebook services, such as WhatsApp and Instagram, to keep in touch with customers

Deryckx and his team at Down Detector monitor web services and websites for outages. He says that widespread outages affecting key services are becoming more frequent and more severe.

“When Facebook has a problem, it creates a big impact for the internet, but also for the economy and ultimately for society. Millions, or potentially hundreds of millions, of people are just sitting around waiting for a small team in California to fix something. It’s an interesting phenomenon that has grown in the last two years.”

significant collapses

October 2021: A “configuration error” took down Facebook, Instagram and WhatsApp for almost 6 hours. Other sites like Twitter were also shut down due to the increase in new visits to their apps.

July 2021: More than 48 services including: Airbnb, Expedia, Home Depot, Salesforce were down for about an hour after a Domain Name System (DNS) bug at content delivery company Akamai. A similar stoppage at the company had occurred a month earlier.

June 2021: Amazon, Reddit, Twitch, Github, Shopify, Spotify, several news sites were down for about an hour after a previously unknown bug was accidentally triggered by a customer at cloud computing service provider Fastly.

December 2020: Gmail, YouTube, Google Drive and other Google services simultaneously fell for about 90 minutes after the company said it encountered an “internal storage quota problem.”

November 2020: A technical issue with one of the Amazon Web Service installations in Virginia, USA, affected thousands of third-party online services for several hours, primarily in North America.

March 2019: Facebook, Instagram and WhatsApp all crashed or were severely stopped for about 14 hours after a “server configuration change”. A few other sites, including Tinder and Spotify, which use Facebook for logins, were also affected.

Inevitably, at some stage, during a major service crash, people fear that the outage is the result of some kind of cyber attack.

Credit, Getty Images Photo caption, Experts say internet services have become too centralized

But experts suggest, for the most part, it’s a more mundane case of human error, compounded, they say, by the way the Internet is held together with a complex set of outdated and complicated systems.

During the Facebook outage, experts joked on Twitter that some of the reasons for the downtime problems are “older than the Spice Girls” and “projected on a napkin”.

Researcher Bill Buchanan agrees with this characterization: “The internet is not the large-scale distributed network that Darpa (Defense Advanced Research Projects Agency), the original architects of the internet, tried to create, that could withstand an attack on any part of it”.

“The protocols it uses are basically just the ones that were designed when we connect to mainframe computers from dumb terminals (as terminals with limited functionality are called). A single flaw in its core infrastructure can cause everything to fall apart. “

Credit, Getty Images Photo caption, Reaction to the interruption of Facebook services was mixed: some users resorted to good humor; others were angry

Professor Buchanan says improvements can be made to make the internet more resilient, but that many of the fundamentals of the web are here to stay for better or worse.

“Usually the systems work and you can’t just turn off certain internet protocols for a day and try to redo them,” he says.

Rather than trying to rebuild the systems and structure of the internet, Professor Buchanan says he believes we need to improve the way we use it to store and share data, or we will be at greater risk of mass outages in the future.

He argues that the internet has become too centralized, that is, a space in which a lot of data comes from a single source. This trend needs to be reversed with systems that have multiple nodes, he explains, so that no failure can stop a service from functioning.

There is hope about this. While significant internet disruptions affect users’ lives and businesses, they can also ultimately help improve the resilience of the network and the services connected to it.

For example, the American magazine Forbes estimates that Facebook lost US$66 million (R$365 million) during the six-hour interruption, with the suspension or exodus of advertisers from the site. This kind of loss will likely focus the minds of senior executives on preventing it from happening again.

“They lost a lot of money that day, not just in stock price but also in operating income,” says Deryckx.

“And if you look at the outages caused by content delivery networks like Fastly and Cloudflare, they’ve also lost a lot of customers to the competition. So I think these carriers are doing everything they can to keep things online.” , concludes.