Our cyber security products span from our next gen SIEM used in the most secure government and critical infrastructure environments, to automated cyber risk reporting applications for commercial and government organisations of all sizes.
There has recently been a prominent example of how damaging a serious IT outage can be. The hours-long interruption in service that Facebook (and its other platforms Instagram and WhatsApp) suffered recently, made news around the world. It cut off social networks, friends, relatives, lovers and businesses. Only Twitter saw the funny side.
The root cause is still the subject of some speculation and we have no information on that, beyond what’s been published on the Internet. What was clear, however, is how disruptive and damaging an outage can be, howsoever it was caused. Facebook became the news as its share price fell almost 6%, leaving Mark Zuckerberg an estimated $7billion out of pocket. Now that’s a sizeable amount, but already the price has partly rebounded; so, he’s unlikely to starve!
The prevailing theory is that the outage was caused by a remote administrator updating the BGP routing configuration. The change meant that routing was disabled as the old configuration was removed – but the new configuration couldn’t be configured because it was being done remotely. As a result, Facebook’s application servers and DNS hosts became unreachable and, being remote, they couldn’t connect in to fix it. Reportedly someone who knew what they were doing had to physically get to site and reconfigure the settings on the routers to bring the environment back up.
Ignoring the frailty of IT systems to human error, and the difficulties and vulnerabilities of routing configurations and DNS, what can the rest of us learn from the disruption caused by the outage of such critical social infrastructure?
A worst case scenario for many businesses, not just Facebook, is a complete loss of service. Facebook’s business model is totally reliant on online access and the Internet. Many other businesses don’t consider themselves to be as exposed to that kind of failure, but the reality is that in a digital world even a small outage can have a hugely disruptive effect.
This can be caused by misconfiguration or human error (as was perhaps the case for Facebook), an oversight, a physical failure or a deliberate act. The cause, as always, is much easier to pinpoint after the fact.
We have seen similar implications in non-IT businesses too – oil pipeline operators, food manufacturers and healthcare providers who businesses have suffered major outages as a result of ransomware attacks. Their reliance on IT, even though they trade in the physical world, meant that services and their delivery were similarly affected. This shows that no company can afford an IT outage – no matter how it is caused. Network misconfiguration is just one cause of failure; and ransomware another which has over recent times become more common than the calamitous events we saw in the social media world last week.
What the Facebook event shows is not how to avoid downtime, outages and blackouts –instead, it shows how small episodes that can seem almost trivial can give rise to such enormous consequences.
You can’t avoid all risks. Whether it’s a network administrator changing routes or a user with a malicious email attachment, people make mistakes. If, as the mathematician Lorenz proposes, a butterfly flapping its wings can result in a tornado, it’s important that early signs of risk are acknowledged as part of your risk management process.
We can learn about the risks of changing BGP configurations from Facebook; or when it comes to ransomware, learn how to reduce the risk of becoming infected. In both instances, however, effective mitigation strategies that prevent a risk or contain its impact are key to lessening the potential effect across an entire enterprise.
Maybe a backup router configuration strategy might’ve helped Facebook (if they had been easily accessible). Although, to be fair, massive on-line businesses like Facebook typically have huge backup data centres available to provide resilience and mitigation against catastrophic events.
For many other failure scenarios, however, backups are an important part of a Plan B. Loss or corruption of data can render even a fully working, internet connected, server inoperative. In the event of hardware failures, ransomware, theft, deliberate misuse or vandalism – it’s often the presence or absence of that make the biggest difference.
In some ransomware attacks, where the decryption process has been absent, unworkable or too slow, backups have provided the road to recovery. Colonial Pipeline found that; and so did Maersk when they were hit by NotPetya. They only managed to get their systems back because of a single domain controller, located in a remote Nigerian office and unaffected by the broader network outage. Incredibly, it was this only copy of the user and system Active Directory (which was ultimately flown back to head office) that enabled the recreation of the Maersk windows domain.
We’ve seen lots of significant systems outages in the past, resulting from numerous causes, and Facebook is just the most recent high profile “victim”. We also know that such disruptive events can stem from something as small as a butterfly flapping its wings.
Effective risk management means dealing with these, and where they can be foreseen, having controls in place. Every company can learn something about network support and administration from the Facebook experience, and in the same way every company can learn something about ransomware from Colonial Pipeline and about the importance of backups from Maersk.
You do have to sweat the small stuff!
The UK market has its own regulators, security standards and challenges. And while rulings from SEC in the US or the Australian Prudential Regulation Authority (APRA) in Australia don’t apply to UK companies, for the most part, the observations are undoubtedly relevant and the resulting advice instructive. It would be wrong to think UK financial […]Read more
<<< Part 2a: Australia’s Essential Eight: Beyond Endpoint Control <<< Part 2b: Activating UK NCSC & US NIST Guidelines: Beyond Endpoint Control Part 4: Systematic Measurement of Cyber Controls >>> As much as we invest into cyber security controls, external threats are inevitable. In a recent Notifiable Data Breaches Report from the Office of the […]Read more
Keen campers, scouts and even the Swiss Army know – that a good penknife is indispensable. This simple device has mitigated many a disaster at one point in time or another. Whether it’s to cut through a bit of string, tighten a screw or simply to solve the problem of no bottle opener in the […]Read more
Supply chain risk is an area of cyber security that demands the ongoing attention of every enterprise; because it can make the difference between being resilient or not. It’s no surprise that insurers warn that the vulnerability of supply chains is potentially a systemic risk that can quickly propagate across supply chain dominated industries. Organisations […]Read more
It took a “tripartite cyber assessment” by the Australian Prudential Regulation Authority (APRA) to identify that a sample of financial organisations had inadequate cyber security: poor security control management, a lack of business recovery planning and inadequate 3rd party risk assessment. Why were there gaps? Where is the failure? Clearly the common practice of unsubstantiated […]Read more
The discussion over data-driven vs qualitative cyber security assessment has been going for some time. Nowadays, it is at the top of the priority list for many security and senior executive teams. Managing cyber security has always been a noble ambition but without reliable measurement, the lack of actionable information makes evidence-based management decisions almost […]Read more
Attack Surface Management (ASM) characterises a business’s security risks as the monitoring and risk mitigation of a constantly changing and vulnerable “risk-surface”. Importantly, this attack surface extends to both internal and external assets and services. Some ASM solutions deliver clear visibility across both Internet facing and internal assets. Others do not. Instead, they assess external […]Read more
The UK Government has released its annual “Cyber Security Breaches Survey 2023”. It provides some valuable insights into how cyber security is currently being managed in the UK, by a range of organisations. It also speaks to how current competing economic priorities are impacting the effectiveness of some cyber security management efforts. The full report […]Read more
Solving the mismatch between cyber security reporting and directors’ requirements You are undoubtedly familiar with the headlines; you may have even become in part desensitised to them: ‘Cyber-attacks are increasingly damaging’, or ‘large amounts of personal data are most at risk’. The important take-away, however, is that modern day thieves can easily gain access to […]Read more
A system to address the untrustworthy security environment Zero trust approaches to security have been talked about for a while; but in recent times they have certainly gained more currency. As a model for protecting data and services, the simplicity of the concept is its biggest strength – assume, as a default position, there is […]Read more
Read by directors, executives, and security professionals globally, operating in the most complex of security environments.