I don’t remember which podcast or who said it, but “Garbage In Gospel Out” is so true. Especially when talking about Cyber Threat Intelligence. I talked a little about this before, both in conference talks and in Validate Data Before Sharing.
But here it is, three years later, and the problem remains. I’m willing to say it is getting worse. We’re not running full life cycles, either Intelligence or Incident Response. We get to the collection phase and call it done. NixIntel has a good post on that at their blog.
There is still a tendency of too many analysts to run a software-based tool, usually a sandbox, without understanding the results presented. The tool has done the data collection portion and maybe some light processing. It hasn’t provided an analysis of the data. There may be some context, but a lot is missing.
The tool is being used as an easy button to speed up the work. This speedup is a good thing, but one has to understand part of the lifecycle is getting the speed boot. It is data collection, not data analysis. Unfortunately, it appears the analysis step is getting skipped. The tool’s output, which is just data collection, gets injected into security blocklists, causing legitimate services to get blocked.
Some people might want to argue that it is a localized problem. But there is evidence otherwise. Back when I wrote Validate Data Before Sharing [[insert link agai]], I referred to the community version of ThreatConnect’s entry on American Express’ aexp[.]com. The page has since been removed from ThreatConnect, or at least it doesn’t show up in my view.
But the problem still happens.
Things that shouldn’t be getting blocked are still getting blocked because they show up in a tool somewhere. That data gets shared without being analyzed or validated. Malicious URL testing and U.S. Cert are just two unrelated recent examples.
Malicious URL testing
The cloud-based corporate level email security gateway at $dayjob has malicious URL protection built-in to the offering. The gateway will search the incoming emails for URLs in the body and re-write them to get tested in the provider’s cloud sandbox environment. Since $dayjob implemented the feature, I’ve seen the host on several external blocklists. Though it usually gets removed relatively quickly, but not always.
About two weeks ago, uBlock Origin started displaying the blocked site page because of one of the third-party lists I use. When I contacted them about the false positive, they didn’t seem all that concerned. My initial report didn’t have enough information in it, even though I explained what the host was, the usage, and who the owners were. Their reply was I didn’t provide enough information and that they wouldn’t be removing it for “privacy reasons.”
But how does that host keep getting on blocklists?
Most likely, overworked and undertrained analysts (I don’t blame the analysts) running reported phishing emails through automate sandboxes and then sharing their “findings.” This means dumping the malware collection report into any community sharing site they can. Pushing out Indicators of Compromise (IOCs), so others can stay safe from the bad things on the internet.
U.S. Cert, same problem
Even U.S. Cert suffers the problem; their Alert AA20-225 is a great example. It’s was a COVID-related phish claiming to be from the Small Business Administration [[insert link]]. Their original posting from August 12, 2020, contained a list of IOCs in STIX format. But, U.S. Cert removed the IOCs from the page on August 14, 2020. Although we can still find copies of those indicators are still available on other sites through Google. I suspect that the IOCs were not validated before sharing, leading to the later removal. For example, one of those IOCs was for a TransUnion subsidiary’s website used for data analytics.
To see how something like this could happen, look at the phishing report from the online sandbox tool AnyRun [[insert link]]. When the phishing page loaded in the browser, the browser created additional network traffic. That network traffic ended up in the malware sample report. One of those hosts was the TransUnion subsidiary’s analytic domain.
Infosec, Cybersecurity, or whatever it’s being called this week needs to stop confusing the reports our tools give us for actual analysis. All we’re getting is a report of the data collected. There may be some light processing, but the data isn’t analyzed.
Until we stop confusing collection reports for analysis, all we’re ending up with is Garbage In, Gospel Out.