Lost in the Cyber Data Breach Noise

One of the biggest challenges with news articles is understanding the severity of a data breach, how it affects consumers, and what actions are needed. There are hundreds of definitions of a data breach and it’s extremely difficult to understand which one the reporting is referring to, let alone try to build consistency around the meaning. To make matters even more divergent, there is a plethora of technical definitions in standards such as COBIT and ISO 27. The lack of a universal, clear definition creates confusion.

You might be asking yourself, am I compromised? Do I need to change my passwords? Let’s examine the 2018 Marriott breach, when news broke that over 500 million “records” were compromised. “Record” is the most widely used term when evaluating and comparing the scale of a data breach, yet also has no clear definition in the industry. Passwords? Credit cards? Passport numbers? It was highly unlikely that all of the records contained sensitive Personally Identifiable Information (PII) or Payment Card Industry (PCI) type data - some of the more damaging/valuable pieces of information. 

“Data breach” is neither a technical term, nor is it regulated. In the United States, notification laws are regulated at the state level. Each individual state has its own definitions and data criteria. But, outside the U.S., most are regulated at the country level (for example, Australia). There are even some regulations that extend across multiple countries (such as GDPR). 

Some industries require reporting when the accessed data involves certain types of sensitive information. In general, that data falls into one or more of these categories: PII, PCI, and  Personal Health Information (PHI). But even a term such as PII can be ambiguous due to jurisdiction within various regions. This could be defined to include information such as religious beliefs or political opinions, for example in the EU.


One data, two data, red data, blue data

Imagine if a hacker infiltrated your system and stole thousands of lines of source code. In some cases, that would surprisingly not need to be reported to authorities. Although intellectual property and trade secrets are considered valuable and sensitive information to a company, this data does not fall into a reportable class. Regardless, theft of this information could have a huge impact on a company’s bottom line and investor confidence. This coverage line is non-existent because it’s so difficult to quantify the value for loss.

Speaking of data, usernames and passwords also do not fall into clear cut categories because, in isolation, they are not associated with a specific person and are not considered PII by most definitions. However, in most insurance policies, this theft will trigger a breach as breaches under insurance terms go beyond sensitive information and include, for example, the cost to clean up the infection and business interruption associated with it. Moreover, if the passwords were not cryptographically secured and salted, the effort required to break the security is trivial, thus increasing the consumer vulnerability that they would be unaware of and potential company liability. What about breaches that result in direct funds being transferred? For example, was the CFO tricked to wire payment externally, or was there a malware infection that resulted in an unauthorized money transfer? It might trigger under a cyber insurance policy, but more than likely will trigger on a crime insurance policy. Either way, the results are not always clear. 
 
So insurance companies are likely to have more data than reporting agencies. And of course, all of these scenarios assume you know what was stolen. Sometimes, especially with prolonged breaches, you are not always sure.


What goes on behind the headlines

In general, data breaches are only reported when the company finds out there was unauthorized access to its data. And, even then, the data breach still isn’t always reported. When we talk about malicious actors gaining access to corporate networks, the average infection-to-detection time is 197 days. That means data could have been stolen and used for many months before anyone knows where it came from. Some companies may not even report a breach if, for example, it came through a vulnerability disclosure program. Generally, the longer a malicious actor is in a network, the more data is at risk. For example, consider the computer system logs. Many of them will get rotated through after a certain period of time due to hard drive space constraints. Sometimes old records are expunged. The same goes for backup images of the data. What if an attacker created a fake user account for persistent access? This user may go undetected if the supporting evidence was erased due to log rotations.  

Smaller breaches, like stolen laptops or email accounts, are usually reported faster but with less detail as the compromised data is limited to the device or account. For example, hard drive encryption works well with laptops that are shut down after use. It doesn’t matter if you use AES-128, Twofish, or any other strong encryption. But what happens when a laptop is stolen with the user logged in? Even with some of the best protections available, a well-planned operation can defeat those protections. 

In some cases, even incident responders have trouble finding out what data was taken. Sophisticated malicious actors cover tracks, delete logs, and encrypt data they’re taking out of the network. If they know they’re discovered, they can make sure to cover their tracks even further, complicating the reporting mechanism. For example, if you don’t know what data was taken, how are you supposed to report a breach accurately? 
There are dramatic differences in how lost or stolen data is used, depending on who accessed it. In the field of threat intelligence, not every actor has financial gain, and so depending on what was stolen, that data may never see the public view. Financial crime syndicates, black hats, nation states and hacktivists are a few types of hacking groups with varying motivations, from financial gain to information gathering. And tracking actors is hard. There is almost never a clear way to get full attribution to hacks, and when investigating crimes, it takes a lot of cooperation to pin precise timing on certain users. There are also some instances of false flag planting, especially used in the case of nation states. 

Understanding the breach headlines can be a challenging task. Many times, we simply take record counts as the know-all-be-all for comparing breaches. Lack of a unified standard to publicly report a breach makes it incredibly difficult to compare them side by side. At Guidewire Cyence™ Risk Analytics, we know the importance of trying to break down this barrier and use advanced tools such as AI to help cut through the noise. But for now, us humans will still have a hard time deciphering until we get to a unified standard for reporting. 
 

Matthew Honea
Director of Cyber
Guidewire 
mhonea@guidewire.com