Phew. Friday (7/19) was a doozy for WTG and many other Microsoft/CrowdStrike partners and customers. We sincerely hope that everyone who was impacted have moved beyond the incident and continues leveraging one of the top-tier EDR platforms to provide exceptional cyber security protection on their systems. There have been numerous reports of threat actors taking advantage of the situation, either through Email/support scams or targeting unprotected systems. Outside of this specific incident itself, the situation brought to light other issues which are fairly common in this circumstance. After all, for this incident (however novel), a post-incident review or “lessons learned” analysis is one of the most powerful steps in the NIST IR framework itself. Here are a few post-incident thoughts we have that IT leaders may want to contemplate:
- Where are the operating system (“OS”) vendors in this? What is their responsibility to ensure a stable operating platform that can be easily recovered from this or any type of incident? In 2024, should “manual” intervention be required because a system won’t boot? What is the role for the OS OEM to harden systems, so they are more resilient to any update, and have intelligence to self-protect and self-heal from an incident like this? Do they need to be more protective of the kernel and boot process itself?
- Why were some organizations able to recover Friday morning/mid-day and others, 5 days later, are still recovering? A workaround and “fix” was provided rapidly. Understandably, some of the processes required manual intervention, however, the operational question remains – why were some organizations (all shapes/sizes) able to recover relatively quickly (however painful) and some still haven’t recovered? What does this tell us about modern IT Ops and planning for disaster?
- Speaking of IT Ops – was this incident in anyone’s Business Continuity Plan (“BCP”)? Had anyone considered a widespread incident (not necessarily ransomware) that may have required manual intervention on computer systems? More specifically, does your organization have a BCP that covers this almost novel incident, especially with manual intervention required? In a world of SaaS and cloud-delivered apps and services, are your key processes survivable without a computer system or network (“however painful”)? Do you have “paper processes” for core functions such as scheduling? It’s clear that some organizations were more resilient than others, and it would stand to reason that their BCP probably contemplated this type of event.
- When did we drift away from Release Management (“RM”)? It was commonplace to have DEV/TEST/QA/PILOT/PROD for everything. Testing code and testing updates. Have we let go of too much RM control with especially SaaS and cloud-delivered apps and services? It could be rationalized that the reason for this type of update “everywhere” was to protect all systems equally against zero-days. In my humble opinion, that’s just rationalizing an unfortunate situation. Customers should always have the choice to stage, control, and otherwise manage any updates to their systems.
- “A steady hand at the tiller” – with modern media hysteria the way it is, it’s compelling and easy for board members, CxOs, directors, practitioners, and consultants to get wrapped up in a single incident. When does perception trump reality? The reality is in 2024, with advanced EDR-type protection, the global economy is looking at nearly $50,000,000,000 ($50B) in expenses related to ransomware alone (never mind other breaches). CrowdStrike has a nearly impeccable record, up until now, and arguably still has an impeccable record for malware/ransomware defense. Should we be “happy” that this was really a once-in-10-year-type event (or once ever)? In fact, the sheer number of organizations impacted by this goes to show how effective and popular CrowdStrike is as a platform (are they a victim of their own success?). Strong advice for leaders is to keep that “steady hand at the tiller” – organizations thrive when leadership remains informed, making data-driven and unemotional decisions instead of “knee-jerk” reactions to even very impactful incidents such as this one (not understating the impact of this). This, of course, is true of any incident.
Again, we hope you’ve recovered from this incident and it’s far in the “rear view” mirror. Microsoft and CrowdStrike, among others, remain important partners to WTG and our customers. We stand by our relationship with both, just as we stand by our relationships with our other vendor partners in the marketplace, so our customers have best-of-breed choice and can make informed, data-driven decisions on what fits their organization best. Stay vigilant my friends!