Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions




Ngufor, Che G

Journal Title

Journal ISSN

Volume Title



Too often in the real world information from multiple sources such as humans, experts, agents, or classifiers need to be integrated to provide support for a decision making system. One popular approach in machine learning is to combine these sources through an ensemble learning method. Ensemble learning has been proven to provide appealing solutions to many complex and challenging problems in machine learning. These include for example learning under non-standard conditions such as learning from large volumes of data, learning in the presence of uncertainties, learning with data streams, or when the concept to be learned drifts over time. Although considerable amount of research work has been done in ensemble learning in recent years, there still remain many open issues and challenges. This thesis explores three major challenges in this research area: First, development of techniques that scale up to large and possibly physically distributed databases. Second, construction of exact or approximately exact global models from distributed heterogeneous datasets with minimal data communication while preserving privacy of the data. Third, how to efficiently learn from modern large-scale datasets which are often characterized by noisy data points, unlabeled or poorly labeled, sample bias, missing values, etc.



Statistics, Mathematics, Computer science, Active learning, Adverse drug reactions, Ensemble learning, MapReduce, Machine learning, Variational Bayesian methods