Researchers devised a method to unmask malware’s use of TLS without decrypting the data flow. The technique relies on analysis of observable data features.
A team of security experts from Cisco demonstrated that it is possible to detect a malware in TLS connections without decrypting the traffic and block it.
The researchers Blake Anderson, Subharthi Paul and David McGrew detailed their method in a paper titled “Deciphering Malware’s use of TLS (without Decryption).” Their approach leverages on the consideration that every malware leaves recognisable footprints in the traffic, even when it is TLS protected.
Of course, the approach is revolutionary because security solutions will have no need to decrypt the traffic before inspecting it.
The experts analysed thousands of malware samples belonging to 18 malware families (i.e. Bergat, Deshacop, Dridex, Dynamer, Kazy, Parite, Razy, Zedbot) and Zusy families, and tens of thousands of malicious connections out of the millions of encrypted flows captured from an enterprise network.
The experts used a deep packet inspection only to identify clientHello and serverHello messages, and to identify the TLS version used in the connection, user data was not managed by the network equipment anyway.
“In this paper, we focus on TLS encrypted flows over port 443 to make the comparisons between enterprise TLS and malicious TLS be as unbiased as possible.” states the paper.”To determine if a flow was TLS, we used deep packet inspection and a custom signature based on the TLS versions and message types of the clientHello and serverHello messages. In total, we found 229,364 TLS flows across 203 unique ports, and port 443 was by far the most common port for malicious TLS. Although the diversity of port usage in malware was great, these diverse ports were relatively uncommon.”
The analysis of data traffic allowed the researchers to attribute encrypted connections to the malware families, they also highlighted the accuracy of the technique that is able to distinguish among malicious codes belonging to different families (flow-based features).
“Finally, we show how we can perform family attribution given only network based data. This problem is positioned as a multi-class classification problem where each malware family has its own label.” continues the paper. “We identify families who use identical TLS parameters, but can still be accurately classified because their traffic patterns with respect to other flow-based features are distinct. We also identify subfamilies of malware that cannot be distinguished from one another with only their network data. We are able to achieve an accuracy of 90.3% for the family attribution problem when restricted to a single, encrypted flow, and an accuracy of 93.2% when we make use of all encrypted flows within a 5-minute window. “
The researchers used a custom software that was able to extract the data features of interest from live traffic or packet capture files. The application used flow metadata (bytes in and out, packets in and out, network port numbers, and flow duration), the sequence of packet lengths and times; byte distribution; and TLS header information.
Which are test results?
The researchers have obtained positive results by testing their machine learning to the flow analysis reaching an accuracy of 90.3% of the family attribution problem.
“We are able to achieve an accuracy of 90.3% for the family attribution problem when restricted to a single, encrypted flow, and an accuracy of 93.2% when we make use of all encrypted flows within a 5-minute window.” closed the paper.