WFH made Facebook’s outage worse with around 75% of its 60,000 workforce not in the office to fix it

Facebook’s WFH policy made 7-hour outage worse: 75% of its 60,000 workforce were still not in the office to fix it as insiders say blackout disabled building security passes and internal comms

Monday’s outage caused all of Facebook’s services to go offline for several hoursStaff were unable to communicate with each other or access the office buildingsAn insider said working from home made it more difficult to fix the issues Facebook has 60,000 staff worldwide and is operating at 25% office capacityMark Zuckerberg plans to carry on WFH and delayed full reopening until 2022



<!–

<!–

<!–<!–

<!–

(function (src, d, tag){
var s = d.createElement(tag), prev = d.getElementsByTagName(tag)[0];
s.src = src;
prev.parentNode.insertBefore(s, prev);
}(“https://www.dailymail.co.uk/static/gunther/1.17.0/async_bundle–.js”, document, “script”));
<!–

DM.loadCSS(“https://www.dailymail.co.uk/static/gunther/gunther-2159/video_bundle–.css”);


<!–

Facebook insiders claim the tech giant’s seven-hour outage that caused all of its services to go offline was exacerbated by employees working from home as staff were locked out of remote messaging systems and company buildings.

The social media company, which has been leading the charge for post-pandemic remote working, had all the systems needed to fix the issue disabled, from digital engineering tools to messaging services and even key fobs at the Menlo Park headquarters in California.

One source said remote staff were left unable to communicate with each other to fix the issue, while a data centre where the servers are located at Santa Clara was short-staffed due to Covid restrictions and WFH. 

Jonathan Zittrain, director of Harvard’s Berkman Klein Centre for Internet and Society, said: ‘Facebook basically locked its keys in its car.’

The tech giant has around 60,000 employees globally and announced in May that it would be operating with a 25 per cent capacity in its offices once it reopened in July.

They planned to have 50 per cent of the workforce in the office by September and 100 per cent in October, but a surge in Delta infections caused bosses to push back the return date until next year. 

Facebook explained that Monday’s problem was caused by a faulty update that was sent to their core servers which effectively disconnected them from the internet.

Engineers were rushed to the company’s data centres in Santa Clara to reset the servers manually, but it took until 5.45pm Eastern Time (10.45pm GMT) for them to be reconnected due to the ‘logistical challenge’ of employees who could offer assistance being cut off and at home.

Employees at the company’s Menlo Park, California, campus had trouble entering buildings because the outage had rendered their security badges useless, while other staff already inside the buildings were locked out of conference rooms, forcing them to communicate via text messages and Outlook emails. 

New York Times reporter Sheera Frenkel said: ‘Was just on phone with someone who works for FB who described employees unable to enter buildings this morning to begin to evaluate extent of outage because their badges weren’t working to access doors.’ 

Meanwhile an insider said on Reddit: ‘There are people now trying to gain access to the peering routers to implement fixes, but the people with physical access is separate from the people with knowledge of how to actually authenticate to the systems and people who know what to actually do, so there is now a logistical challenge with getting all that knowledge unified.

‘Part of this is also due to lower staffing in data centres due to pandemic measures.’ 

Engineers were rushed to the company’s data centres in Santa Clara, California (pictured), to reset the servers manually

Employees at the company’s Menlo Park (pictured) campus had trouble entering buildings because the outage had rendered their security badges useless

A person claiming to be a Facebook employee said on Reddit that high numbers of staff working from home made the problem worse. The account was later deleted 

Employees use Facebook services to communicate with each other and its internal messaging platform Workspace was also down, leaving many unable to do their jobs and discuss how to fix the issue while working from their homes.

Facebook engineers said in a statement: ‘The underlying cause of this outage impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem.

‘Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. 

‘This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.’ 

Kevin Collier, an NBC news reporter, said: ‘Don’t yet know exactly what’s behind the DNS issue that’s knocked Facebook/Instagram/WhatsApp offline, but it’s really bad. 

‘Pretty much everything that runs through those three companies are inaccessible. Employees can’t even enter conference rooms because they’re IoT (internet of things)!’

Facebook has 47 locations across North America but many are smaller data sites, while 15,000 people, around a quarter of the total workforce, are based in the Menlo Park headquarters.

Mark Zuckerberg has pledged to move to a working from home setup within the coming years and predicts that as much of half of the workforce will be remote within the next five to ten years.

The CEO said he would start ‘aggressively opening up remote hiring’, telling the Verge: ‘We’re going to be the most forward-leaning company on remote work at our scale.

Mark Zuckerberg has pledged to move to a working from home setup within the coming years

WHAT IS THE DOMAIN NAME SYSTEM AND HOW DOES IT WORK? 

The Domain Name System, or DNS, is the directory of the internet.

Whenever you click on a link, send an email, open a mobile app, often one of the first things that has to happen is your device needs to look up the address of a domain. 

There are two sides of the DNS network: the authoritative side, ie webpages and other content, and the resolver side, devices that are trying to access this content.

Every domain needs to have an authoritative DNS provider, servers which store DNS records. Amazon, Cloudflare and Google are among the bigger names in authoritative DNS server provision. 

On the other side of the DNS system are resolvers. Every device that connects to the Internet needs a DNS resolver. 

By default, these resolvers are automatically set by whatever network you’re connecting to. 

So, for most Internet users, when they connect to an ISP, or a WiFi hot spot, or a mobile network, the network operator will dictate what DNS resolver to use.

The problem is that these DNS services are often slow and don’t respect your privacy. 

What many Internet users don’t realise is that even if you’re visiting a website that is encrypted, indicated by the green padlock in your browser’s address bar, that doesn’t keep your DNS resolver from knowing the identity of all the sites you visit. 

That means, by default, your ISP, every WiFi network you’ve connected to, and your mobile network provider have a list of every site you’ve visited while using them. 

Advertisement

‘We need to do this in a way that’s thoughtful and responsible, so we’re going to do this in a measured way. But I think that it’s possible that over the next five to 10 years — maybe closer to 10 than five, but somewhere in that range — I think we could get to about half of the company working remotely permanently.’ 

Monday’s outage caused Facebook shares to plunge 5 per cent amid the outage, wiping some $48billion off its value – though the slide started before the tech problems, in-part due to a whistleblower accusing the company of putting profits before safety in a 60 Minutes program broadcast Sunday night. 

It marks the firm’s second-worst day on the markets ever.

In addition to the stock market slide, Facebook likely missed out on at least $67million in direct revenue and possibly as much as $102million during the outage – based on average hourly earnings across 2020 and projections of its 2021 hourly earnings from Q1 and Q2 results.   

It is also estimated the company lost as much as $545,000 in US ad revenue an hour during the outage.

Zuckerberg’s own stake in Facebook fell by an estimated $7billion.

Facebook was already in the throes of a separate major crisis after whistleblower Frances Haugen, a former Facebook product manager, provided The Wall Street Journal with internal documents that exposed the company’s awareness of harms caused by its products and decisions. 

Haugen went public on CBS’s ’60 Minutes’ program Sunday and is scheduled to testify before a Senate subcommittee Tuesday.

Haugen had also anonymously filed complaints with federal law enforcement alleging Facebook’s own research shows how it magnifies hate and misinformation and leads to increased polarization. It also showed that the company was aware that Instagram can harm teenage girls’ mental health.

The Journal’s stories, called ‘The Facebook Files,’ painted a picture of a company focused on growth and its own interests over the public good. Facebook has tried to play down the research. 

Former Deputy Prime Minister Nick Clegg, the company’s vice president of policy and public affairs, wrote to Facebook employees in a memo Friday that ‘social media has had a big impact on society in recent years, and Facebook is often a place where much of this debate plays out.’  

Advertisement
Read more:

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow by Email
Pinterest
LinkedIn
Share