Using Multi-Task Learning For Large-Scale Document Classification

Date

2013-09-13

Authors

Naik, Azad

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Multi-Task Learning (MTL) involves learning of multiple tasks, jointly. It seeks to improve the generalization performance of each task by leveraging the relationships among the different tasks. It is an advanced concept of Single-Task Learning (STL), most widely used in classification. In STL, each task is considered to be independent and learnt independently whereas in MTL, multiple tasks are learnt simultaneously by utilizing task relatedness. The main intuition is that the training signal present in related tasks can help each of the tasks learn better models. It also allows for learning of better models with fewer labeled examples. In this thesis our focus is on improving the classification performance for a database categorized as a hierarchy and archiving large number of documents. We focus on improving the classification performance of this database (source) by developing a MTL based model. In this model we use an external database to facilitate the classification process for the source database. We have used the logistic regression model for multiple classification tasks and k-nearest neighbor approach for finding the similarities between the classes in two hierarchical databases. The kNN allows us to de fine task relationships. Experiment on sampled DMOZ dataset has been done to evaluate the performance of MTL with STL, Semi-Supervised Learning (SSL) and Transfer Learning (TL). We have also used random projections for achieving better runtime performance at a minimal effect on classification accuracy.

Description

Keywords

Multi-Task Learning, Classification, Model selection, Logistic regression, Random projection (hashing)

Citation