dc.description.abstract |
Cyber attacks are becoming more prevalent and increasingly threaten to have devastating
consequences. Many of these attacks use multiple distributed devices controlled by
a single person or group from a remote location, and are commonly referred to as botnet
attacks. There are multiple reasons why botnet detection is challenging. First, botnets
use covert communication measures and actively attempt to mask their communication.
Second, the command and control (C&C) of these devices may not come from a single
source but instead from peer to peer (P2P) bot communication. Third, network traffic is
inherently very noisy and has high dimensionality both in the data's continuous nature and
the number of variables. Finally, massive botnet data collections are generally incomplete,
and real-world data is challenging to find. These factors complicate performing botnet data
analytics through well-known approaches, such as time series analysis techniques.
Recent results in topological data analysis (TDA) have shown great promise in analyzing
noisy, large scale, and incomplete time series data sets. This thesis explores using TDA
persistence landscapes (PL-TDA) to transform a multi-attribute time series into a single
attribute time series, which can then be analyzed using existing time series/data mining
techniques. We first perform a robustness analysis on existing PL-TDA computational
methods in the presence of noise. We then propose an algorithm using plane-sweeping
methods that decrease the PL-TDA runtime. This algorithm utilizes another result that
demonstrates a linear-time approach to finding the top landscapes that appear when PLTDA
is computed. Following that we show how to implement a processing pipeline for
PL-TDA on botnet data. Finally we show that our new algorithms maintain accuracy
while decreasing runtime. This work assists future network and systems researchers by
giving them a new technique to effectively process network traffic analysis capturing the
inherent topological properties. |
en_US |