Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning

DeBarr, David Douglas

Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning

Files

DeBarr_gmu_0883E_10471.pdf (1.86 MB)

Date

2013-08

Authors

DeBarr, David Douglas

Abstract

Spam, phishing, and fraud detection are security applications that impact most people. Challenges for building spam, phishing, and fraud detection models include difficulty in obtaining annotated data, increased computational complexity for robust detection methods, annotation errors, and changes in the underlying data distribution. We address the above challenges as follows. Clustering and active learning are combined to make efficient use of annotated data, yielding state of the art spam detection performance using only 10% of the annotated data employed by previously published methods. Social Network Analysis (SNA) reputation features for mail transfer agents are introduced to evaluate paths from sender to receiver, increasing the detection rate by 70% (with the same false positive rate) for state of the art spam detection. Random projections with boosting achieve state of the art spam detection with a 75% reduction in computational cost for message classification. The Randomized Hough Transform - Support Vector Machine updates training set annotations, increasing the (precision, recall) F measure by 9.3% compared to a state of the art method for handling adversarial noise. Spectral clustering of URL n-grams and transductive semi-supervised learning are used to increase the detection rate by 100% (doubling the detection rate with the same false positive rate) for state of the art phishing detection under adversarial modification of message text. Reputation and similarity features are used to enhance the ability to withstand changes in underlying data distributions, producing a 13.5% increase in cost savings for state of the art fraud detection.

Keywords

Computer science, Statistics, Applied mathematics, Adversarial learning, Fraud detection, Machine learning, Random projections, Security, Spam detection

URI

https://hdl.handle.net/1920/8802

Collections

College of Engineering and Computing

Full item page

Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections