Mason Archival Repository Service

Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning

Show simple item record

dc.contributor.advisor Wechsler, Harry
dc.contributor.author DeBarr, David Douglas
dc.creator DeBarr, David Douglas en_US
dc.date.accessioned 2014-08-28T03:17:39Z
dc.date.available 2014-08-28T03:17:39Z
dc.date.issued 2013-08 en_US
dc.identifier.uri https://hdl.handle.net/1920/8802
dc.description.abstract Spam, phishing, and fraud detection are security applications that impact most people. Challenges for building spam, phishing, and fraud detection models include difficulty in obtaining annotated data, increased computational complexity for robust detection methods, annotation errors, and changes in the underlying data distribution. We address the above challenges as follows. Clustering and active learning are combined to make efficient use of annotated data, yielding state of the art spam detection performance using only 10% of the annotated data employed by previously published methods. Social Network Analysis (SNA) reputation features for mail transfer agents are introduced to evaluate paths from sender to receiver, increasing the detection rate by 70% (with the same false positive rate) for state of the art spam detection. Random projections with boosting achieve state of the art spam detection with a 75% reduction in computational cost for message classification. The Randomized Hough Transform - Support Vector Machine updates training set annotations, increasing the (precision, recall) F measure by 9.3% compared to a state of the art method for handling adversarial noise. Spectral clustering of URL n-grams and transductive semi-supervised learning are used to increase the detection rate by 100% (doubling the detection rate with the same false positive rate) for state of the art phishing detection under adversarial modification of message text. Reputation and similarity features are used to enhance the ability to withstand changes in underlying data distributions, producing a 13.5% increase in cost savings for state of the art fraud detection.
dc.format.extent 208 pages en_US
dc.language.iso en en_US
dc.rights Copyright 2013 David Douglas DeBarr en_US
dc.subject Computer science en_US
dc.subject Statistics en_US
dc.subject Applied mathematics en_US
dc.subject Adversarial Learning en_US
dc.subject Fraud Detection en_US
dc.subject Machine Learning en_US
dc.subject Random Projections en_US
dc.subject Security en_US
dc.subject Spam Detection en_US
dc.title Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning en_US
dc.type Dissertation en
thesis.degree.level Doctoral en
thesis.degree.discipline Computer Science en
thesis.degree.grantor George Mason University en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Advanced Search

Browse

My Account

Statistics