how a single software update could cause IT chaos across the world

The world as we know it is increasingly dependent on digital connectivity that, for the most part, operates silently and invisibly in the background. So how could a single software update take down half the internet?

The global IT outage on July 19 is a stark reminder of our vulnerability to technological failure. Caused by a single faulty software update from cybersecurity firm CrowdStrike, it had a disastrous impact on airlines, media, banks and retailers worldwide, particularly those using Microsoft Windows operating systems.

This incident, described as the “largest IT outage in history,” is a reminder of the vast web of IT interconnections that sustain our digital infrastructure – and the potentially far-reaching consequences if something goes wrong.

What started as airport delays has morphed into widespread flight cancellations. The disruption to aviation systems is not only disrupting flight schedules, but is also affecting global supply chains that rely on air freight, demonstrating the multifaceted nature of modern IT ecosystems. Meanwhile, countless TV and radio stations have had their broadcasts interrupted, and supermarkets and banks have ground to a halt.

Preliminary analysis suggests the chaos stemmed from a software update to CrowdStrike’s Falcon Sensor security software that was deployed to Microsoft Windows operating systems. Employees at companies running CrowdStrike were met with a “blue screen of death” (an error message indicating that the system had crashed) when they tried to log in.

In addition to exposing the hidden web of dependencies that sustain our digital society and economy, the outage also highlighted the geopolitical dimensions of these dependencies. Countries with strong ties to Microsoft and CrowdStrike felt the heaviest impact, but companies in countries like China, with their relatively isolated and controlled IT infrastructures, appear to have been less affected.

Supermarket

Given the increasing geopolitical tensions of recent years, China and a growing number of other countries have actively developed their own cybersecurity measures and digital infrastructures, which may have mitigated the impact of this incident.

China’s focus on using indigenous technology and reducing its reliance on foreign technology may also have contributed to the reduced impact on its systems. The incident serves as a stark reminder that technological dependencies can translate into geopolitical vulnerabilities, with state authorities increasingly having to consider not only the economic but also the strategic and geopolitical implications of their IT alliances.


Read more: Major IT outage shuts down businesses worldwide – expert explains what happened and why


Recovery and implications

How affected industries have handled this crisis reflects both the strength and the vulnerabilities of their own security and disaster recovery strategies. The primary problem has been identified and reportedly fixed. The slow recovery process ahead will demonstrate the significant challenges ahead in restoring service continuity across our complex, deeply interconnected digital ecosystems.

It is particularly surprising that, despite many lessons learned in the past, such as TSB’s 2018 IT migration disaster that affected millions of the UK bank’s customers, there has been no phased software rollout.

The lack of this step, a fundamental but crucial strategy in IT management, exposed the vulnerability of systems that many considered robust. It also raised serious questions about the resilience of both the Windows operating systems and CrowdStrike’s cybersecurity measures that were supposed to protect them.

Furthermore, the episode highlighted the strategic risks of relying on a single source of technology. This global outage demonstrated the importance of diverse technology alliances to enhance national security and economic stability, while also raising concerns about the potential for hostile states to exploit such vulnerabilities. This incident will add a new layer of urgency to international cybersecurity collaborations and policy interventions.

As services begin to stabilize and resume, this outage should be a wake-up call for IT professionals, business leaders, and policymakers. The urgent need to reassess and even revise existing cybersecurity strategies and IT management practices is clear. Improving the system’s resilience to withstand large-scale disruptions must be a priority.

The global IT outage is a timely reminder and a pivotal moment for discussions about digital resilience and the future of technology management at the enterprise, infrastructure, and policy levels.

What about AI?

There’s one more thing we don’t know the answer to yet: If a single software glitch can bring down airlines, banks, retailers, media companies and more around the world, are our systems ready for AI?

Perhaps we should invest more in improving the reliability and methodology of software, rather than releasing chatbots too quickly. An unregulated AI industry is a recipe for disaster, especially in a world of growing geopolitical tensions.

While it’s essential to embrace emerging technologies like AI or blockchain, we also need to get the basics right. Cybersecurity operators need to ensure that core IT management and maintenance practices are strong, reliable, and can handle anything from a cybersecurity attack to a simple software update.

The lessons learned from this incident will undoubtedly influence future IT infrastructure development and crisis management strategies.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Feng Li is not an employee of, an advisor to, an owner of stock in, or a recipient of funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond his academic appointment.

Leave a Comment