Time Series Analysis for Botnet Detection

Date

Authors

Henderson, Taylor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Cyber attacks are becoming more prevalent and increasingly threaten to have devastating consequences. Many of these attacks use multiple distributed devices controlled by a single person or group from a remote location, and are commonly referred to as botnet attacks. There are multiple reasons why botnet detection is challenging. First, botnets use covert communication measures and actively attempt to mask their communication. Second, the command and control (C&C) of these devices may not come from a single source but instead from peer to peer (P2P) bot communication. Third, network traffic is inherently very noisy and has high dimensionality both in the data's continuous nature and the number of variables. Finally, massive botnet data collections are generally incomplete, and real-world data is challenging to find. These factors complicate performing botnet data analytics through well-known approaches, such as time series analysis techniques. Recent results in topological data analysis (TDA) have shown great promise in analyzing noisy, large scale, and incomplete time series data sets. This thesis explores using TDA persistence landscapes (PL-TDA) to transform a multi-attribute time series into a single attribute time series, which can then be analyzed using existing time series/data mining techniques. We first perform a robustness analysis on existing PL-TDA computational methods in the presence of noise. We then propose an algorithm using plane-sweeping methods that decrease the PL-TDA runtime. This algorithm utilizes another result that demonstrates a linear-time approach to finding the top landscapes that appear when PLTDA is computed. Following that we show how to implement a processing pipeline for PL-TDA on botnet data. Finally we show that our new algorithms maintain accuracy while decreasing runtime. This work assists future network and systems researchers by giving them a new technique to effectively process network traffic analysis capturing the inherent topological properties.

Description

Keywords

Time series analysis, Topological data analysis, Botnet detection, Representative pattern matching, Dimensionality reduction

Citation