Mason Archival Repository Service

Using Multi-Task Learning For Large-Scale Document Classification

Show simple item record

dc.contributor.advisor Rangwala, Huzefa
dc.contributor.author Naik, Azad
dc.creator Naik, Azad
dc.date 2013-05-03
dc.date.accessioned 2013-09-13T15:06:59Z
dc.date.available 2013-09-13T15:06:59Z
dc.date.issued 2013-09-13
dc.identifier.uri https://hdl.handle.net/1920/8479
dc.description.abstract Multi-Task Learning (MTL) involves learning of multiple tasks, jointly. It seeks to improve the generalization performance of each task by leveraging the relationships among the different tasks. It is an advanced concept of Single-Task Learning (STL), most widely used in classification. In STL, each task is considered to be independent and learnt independently whereas in MTL, multiple tasks are learnt simultaneously by utilizing task relatedness. The main intuition is that the training signal present in related tasks can help each of the tasks learn better models. It also allows for learning of better models with fewer labeled examples. In this thesis our focus is on improving the classification performance for a database categorized as a hierarchy and archiving large number of documents. We focus on improving the classification performance of this database (source) by developing a MTL based model. In this model we use an external database to facilitate the classification process for the source database. We have used the logistic regression model for multiple classification tasks and k-nearest neighbor approach for finding the similarities between the classes in two hierarchical databases. The kNN allows us to de fine task relationships. Experiment on sampled DMOZ dataset has been done to evaluate the performance of MTL with STL, Semi-Supervised Learning (SSL) and Transfer Learning (TL). We have also used random projections for achieving better runtime performance at a minimal effect on classification accuracy.
dc.language.iso en en_US
dc.subject Multi-Task Learning en_US
dc.subject classification en_US
dc.subject model selection en_US
dc.subject logistic regression en_US
dc.subject random projection (hashing) en_US
dc.title Using Multi-Task Learning For Large-Scale Document Classification en_US
dc.type Thesis en
thesis.degree.name Master of Science in Computer Science en_US
thesis.degree.level Master's en
thesis.degree.discipline Computer Science en
thesis.degree.grantor George Mason University en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Browse

My Account

Statistics