Unsupervised Cross-Domain and Cross-Lingual Methods for Text Classification, Slot-Filling, and Question-Answering

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Transfer learning has significantly revolutionized modern machine learning systems by instilling the ability to use the knowledge gained from solving one problem for another. It has also helped to adapt and build models that can be generalized beyond the distributions that they are trained on. This dissertation explores and presents novel techniques for two transfer learning problems (cross-domain and cross-lingual) in the field of Natural Language Processing, for the tasks of text classification, slot-filling, and question-answering. This is particularly motivated by scenarios such as crisis management in which labeled data from a new domain or language cannot be easily obtained during an ongoing crisis to train models. A cross-domain setup (also known as domain adaptation) adapts a model trained on one domain (e.g., Tweets posted during Hurricane Harvey) to another (e.g., Tweets posted during Hurricane Florence). A cross-lingual setup adapts a model trained on one language (e.g., English) to another (e.g., Hindi). Essentially, our goal is to bring seemingly dissimilar distributions into a comparable representation based on a task at hand, so that a model trained on data from one domain/language can be generalized to another. We make three contributions for the task of domain adaptation, focusing on text data in English: (a) We show that machine learning architectures that ensure sufficient diversity can generalize better. In the context of text classification, this is achieved by enforcing orthogonality constraints within and across attention-based neural models, in a fully unsupervised manner unlike traditional methods that require unlabeled data from the target. (b) For text classification in low-resource scenarios (e.g., crisis tweets), where there exist multiple domains and multiple tasks, a setup with domain discrimination while sharing a few internal layers for multiple tasks can generalize well to an unseen domain. (c) For the task of generative question-answering, we propose an adversarial method of masking domain specific words and regenerating them using a sequence-to-sequence language model trained using unlabeled target data. The purpose of this approach is to construct pseudo-labeled target data from the labeled source data. We also make three contributions for the task of cross-lingual learning: (a) In the context of text classification, we show that an attention realignment method that enforces the model to distinguish task-specific versus language-specific words can improve cross-lingual performance. (b) For the task of joint learning of intent prediction and slot-filling (intent being sentence-level label and slot being word-level label), randomized switching of phrases in a sentence to various other languages is shown to generalize well on unseen languages. (c) Finally, we enhance the modern multilingual language models with the ability to classify transliterated text. Practical implications of our work are demonstrated on Twitter posts collected during various natural disasters that span different languages. Due to the generalizability of our models across domains and languages, they can be immediately deployed to aid emergency services during crisis events to extract relevant information. Towards this goal and for designing models that can explain the predictions for crisis management, interpretability of models is also explored. Furthermore, for some of our tasks, we release newly labeled crisis datasets for the research community.

Description

Keywords

Citation