IN scenes that could be the basis for future disaster movies, the world was shook from the fallout of an IT blunder. Some industries such as travel were greatly affected as airports came to standstill, leaving thousands of flights grounded and many more passengers stranded.
As of July 23, KLIA2 had borne the brunt of the global IT outage storm, with systems being reportedly restored within four days of the incident. Elsewhere, other airports are either back in business, slowly recovering or are still affected.
However, it was not just airports that were affected. Other industries such as banks, schools, retail businesses and railways globally were all affected in different degrees. Analysts claim the repercussion from the IT outage could continue to affect entire supply chains for weeks.
So, what brought down these industries? Here is what is known.

Cybersecurity failure
Roots of the chaos can be traced to July 19 after US cybersecurity firm CrowdStrike released a sensor configuration update for their Falcon programme, which is widely installed on Windows hosts, and to a lesser extent on Mac and Linux.
Being a company that specialises in ransomware, malware and internet security products that offers products for businesses and large corporations, CrowdStrike’s Falcon sensor is a cybersecurity programme that provides partially automated protection from malware, antivirus support, incident response and other security features.
The company claimed updates are applied regularly and automatically to the Falcon programme multiple times a day as it is a cloud-based technology but what was supposed to be a normal update on that fateful Friday had a coding error.
It sent millions of Windows computers worldwide to the infamous “Blue Screen of Death” (BSoD) while Mac and Linux computers were not affected.
Coding and logic error
Like the human body attempting to fight off an infection, these BSoD computers then fell into a reboot loop, with each attempt to restart without the error causing another BSoD. The error itself, which CrowdStrike calls a “logic error,” was due to a bug that resulted from a coding mistake.
How the Falcon programme works is by hooking into the Microsoft Windows operating system (OS) as a Windows kernel process. The process has high privileges and it gives Falcon the ability to monitor operations in real-time across the OS.
The flawed update was contained in a file that CrowdStrike refers to as “channel files”, which specifically provide configuration updates for behavioural protections. July 19’s channel file 291 was an update that was supposed to help improve how Falcon evaluates named pipe execution on Microsoft Windows.
With channel file 291, CrowdStrike inadvertently introduced a logic error to the programme’s 7.11 version and above, causing the Falcon sensor to crash and subsequently, Windows systems, in which it was integrated.
For the regular person, they might say: “Well, then just delete the file”.
That is easier said than done, because as mentioned above, the BSoD loop did not allow computers to be booted up normally.

Untangling the mess
The only way to delete the file would be manually booting into the system, which would require an IT administrator and we are talking about one Windows OS here.
Imagine if it is a corporation with over a thousand Windows computers affected by CrowdStrike’s update. Now imagine its hundreds of corporations, companies and industries, all over the world, with complex IT infrastructures and encrypted drives.
This is why the IT outage lasted several days even after it took CrowdStrike less than two hours on July 19 itself to release a fix.
In Malaysia, the outage reportedly only affected KLIA2 as it forced passengers to manually check-in for their flights.
Experts have now called out the hazard of cloud-based tech and the automation of software upgrades due to the global nature of cloud services.
Is this a one-off or an ominous warning that future errors could have more dire consequences? The incident certainly highlights mankind’s over-dependence on technology and how a small error could lead to chaos on a global scale. Many industries incurred losses due to this IT outtage but will consequences be more dire in future?