Abstract:
Despite the increasing botnet threat, research in the area of botmaster traceback is
limited. The four main obstacles are 1) the low-traffic nature of the bot-to-botmaster link;
2) chains of “stepping stones;” 3) the use of encryption along these chains; and 4) mixing
with traffic from other bots. Most existing traceback approaches can address one or two
of these issues, but no single approach can overcome all of them.
Our early work focused on hijacking and sniffing botnet C&C traffic, especially when it
is encrypted. We successfully executed MITM attacks on both IRCS- and HTTPS-based
botnets. We further developed a kernel-level approach for obtaining millisecond-precision
timing in a virtual machine environment, allowing us to run time-based watermarking
code on virtual machines.
The major contribution of this work is a novel flow watermarking technique to address all
four traceback obstacles simultaneously. Our approach allows us to uniquely identify and
trace any IRC-based botnet flow even if 1) it is encrypted (e.g., via SSL/TLS); 2) it
passes multiple intermediate stepping stones; and 3) it is mixed with other botnet traffic.
Our watermarking scheme relies on adding whitespace padding characters to outgoing
IRC messages at the application layer. This produces specific differences in lengths
between randomly chosen pairs of messages in a network flow. As a result, our
watermarking technique only requires a few dozen packets to be effective. To the best of
our knowledge, this is the first approach that has the potential to allow real-time
botmaster traceback across the Internet.
We empirically validated the effectiveness of our botnet flow watermarking approach
with live experiments on PlanetLab nodes and public IRC servers on different continents.
We achieved virtually a 100% detection rate of watermarked (encrypted and unencrypted)
IRC traffic with a false positive rate on the order of 10-5. Due to the message queuing and
throttling functionality of IRC servers, mixing chaff with the watermarked flow does not
significantly impact the effectiveness of our watermarking approach.