Publication: Predicting Alzheimer’s Disease from miRNA Sequence and Expression Data with Machine Learning
Date
Authors
Monserrate, Sydney
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Approximately 6.5 million people, most of whom are 65 years of age and older, have been diagnosed with Alzheimer's Disease (AD) in the United States. Diagnosing AD has notoriously been difficult because disease progression can occur before the onset of cognitive impairment, and the physiological changes in AD brains are largely only observable in post-mortem studies. AD screening has been bolstered by novel biomarkers, including expression profiles of exosomal and circulating miRNAs. Although relatively new to biological studies, these miRNAs have become a focal point due to their widespread availability in bodily fluids and potential use in disease diagnostics. The purpose of our study was to investigate the utility of machine learning (ML) to predict AD-associated outcomes with miRNA sequence and expression data. Machine learning was performed leveraging the Orange Data Mining platform, which allowed us to quickly prototype various machine learning models and assess their performance numerically and graphically. To utilize miRNA sequence data, we employed a k-mer bag of words model to quantify subsequences within miRNAs and predict if miRNAs are involved in AD pathways. We found that a random forest model provides the best predictions with an accuracy of 0.772 and an area under the receiver operating characteristic (AUROC) of 0.813. Interestingly, out all k-mers, we found that those rich in purines are the most predictive of miRNA association with AD. As a second modelling effort, we analyzed a previously published dataset [Ludwig et al. (2019) Machine Learning to Detect Alzheimer’s Disease from Circulating Non-Coding RNAs Genom. Proteom. Bioinform. 17(4): 430-440] that measured miRNA expression in AD and healthy patients. A random forest model produced an accuracy of 0.786 and AUROC of 0.862 approximately reproducing the published results. We explored if the likelihood for miRNAs to be associated with AD-related pathways can be used as additional selection criteria for miRNA expression profile analyses and discuss the broader applications of our machine learning models in AD diagnostics. Ultimately, we believe our machine learning models will be useful to determine for new miRNA sequences if they are likely to be involved in AD and to pre-select miRNAs as biomarkers for expression profile analysis, which could be used as a diagnostic tool.