Validate data, before sharing.

I’m going to have to add a couple more slides to my Threat Intelligence: From Zero to Basics deck. But I told GrrCON that I would have an updated deck from Circle City Con anyway.

Over the last two weeks I’ve seen some stuff shared publicly in Threat Intelligence Platforms, that really shouldn’t have been. The data wasn’t valid, at the time of sharing.

1. The first instance: a big malware outbreak was going on, internet wide. I got an email from an external Threat Vendor ($threatvendor), that they had a list of Indicators of Compromise (IOC) for the big malware du jour ($bmdj). Because we were looking at $bmdj at work, I went to the $threatvendor portal to check out the IOCs.

One of the IOC $threatvendor had was from from another $user. This $user uploaded the IOCs they got from their customer. $user did say “this has not been validated, but our customer is impacted by $bmdj, and gave us this url”. It read as $bmdj_tor_hash[.]onion

Here is the problems with that: as we saw in another $bmdj, it and a smaller one had Indicators mixed together because both were happening at the same time. Also at the time of the post to $threatvendor’s portal, $user was the only one saying that $bmdj was using .onion addresses, other than tor_hash[.]onion address .

Now $user did do right, by saying that it wasn’t validated, but $threatvendor isn’t a trust community. What $user should have done was ask a trust community or other impacted customers if they were seeing $bmjd_tor_hash_[.]onion addresses in the same format. And then put it in to the $threatvendor portal after validation. I have since seen some news coverage saying $bmdj_tor_hash[.]onion, but I wonder if it is from independent verification, or if $threatvendor / $user IOC entry was the source of those for the news story.

2. The second instance: My team is reviewing a possible new Threat Data feed. This one also shares community input, and an API to integrate in to our Threat Intelligence Platform. We were given a list of IOCs. First one on that list, was a domain name associated with a large credit card company.

The person that reported it, saw it as part of a phishing message. That was their whole reason for posting it as an IOC. And because it met the right score, it was on our test list.

The IOC was a false positive. There wasn’t enough context to understand how the phish interacted with the credit card domain. Now, it could have been an automate tool that saw it, and tagged the domain as bad. But it is still tied to that $user, and it is bad data. Also maybe we shouldn’t let our tools auto-upload to sharing sites, without an analyst reviewing it first.

More: The “phishing domain” is the domain the credit card company uses for their business side, while their brand name is what is used for their customers to access the web page. 30 seconds with whois showed that the “phishing domain” was registered by the credit card company, and checking the credit card company registrations one could see the point of contact addresses had the format of “user at ‘phishing domain’ ” for all their domains.

Anyone using an API and pulling IOCs of “BAD” things, are going to be blocking legitimate emails from the company. Filtering didn’t help in this case, the API had a couple of ways to filter, including using a sub-set of a “Top 1 Million” list, and it still showed up on the test list. All because the data wasn’t validated before it was shared.

3. Related to 2, as part of the review, we ran a hunt team exercise. I had to go “enrich” (add more context) the IOCs for the Incident Detection and Response (IDR) team. The IDR team searched the SIEM and chased down any matches to those IOCs.

To try and weed out invalid data, we were looking at data that was put in to the platform by a user account that claimed to be the $data_vendor_research_team. One of the IOC sets picked was an APT group. The analyst comments to tie domain names to the APT was based on one correlation. A single shared data point.

Viewing the IOCs on the $datavendor site, the context by their research analysts said: ‘This is APT $group_number, because it uses the same ($ip_address | $dns_nameserver) as listed in $APT_Group_Report.’

So in this case, all because a domain name shared either the same ip address or nameserver as an APT group, they have to be the same APT group. None of the IOCs said it shared both IP Address and nameserver. Just that they had one piece of some shared infrastructure.

Thankfully the $data_vendor_research_team linked to the report in question. Looking up the IP Address and Nameservers pointed back to hosting providers. So because these domains went to companies who sell and provide dedicated, hybrid, virtual servers and hosting platforms, they must be part of the APT. I’m not saying that there wasn’t at least one provider hosting multiple phishing domains, but that was a pretty weak reason to say all the IOCs belonged to the APT.

By that logic, if a drug dealer got his morning espresso at $coffee_chain, and Bob was getting coffee at the same $coffee_chain, Bob is a drug dealer too. Because Bob went in to a coffee shop that severed a drug dealer at some point in the past.

In all three of these examples, the data wasn’t validated before being shared. All of it needed more enrichment (research) before it was shared, to confirm that it really was what they said it was. It wasn’t validated before sharing, and caused wasted time.

The Verification Handbook is a really good book on validating events in real time as they are happening. It is written for journalists, but the core concepts will carry over to doing Threat Intelligence.

If you’re going to share your Intelligence, make sure the intel is validated first.

Leave a Reply

Your email address will not be published. Required fields are marked *