Prediction of Chemical Activity against Various Disease-Related Targets with Machine Learning Methods



Journal Title

Journal ISSN

Volume Title



High throughput screening (HTS) technologies led to the accumulation of large amount of biological data for a broad range of chemical compounds. The main goal of this study is to build machine learning models of chemical compounds activity against various disease related targets, including cytochrome P450 3A4 (CYP3A4), estrogen receptor 1 (ESR1), adrenoceptor alpha 1A (ADRA1A), and opioid receptors (OPRs) like mu (OPRM), kappa (OPRK) and delta (OPRD). The training sets consist of ~3,000 investigational and approved drugs for animal and human use with experimentally determined in vitro activity. For CYP3A4, ESR1, and ADRA1A targets, the compounds are represented both by their bioactivity and structural features and the models were validated using internal test set and the best performed models achieved an AUC-ROC of 0.90 (ESR1), 0.89 (ADRA1A), and 0.81 (CYP3A4). For OPRM, OPRK, and OPRD targets the compounds are represented only by their structural features and the best performing models have AUC above 0.85. This approach produced robust OPR prediction models that can be applied to prioritize compounds from large libraries which will match or exceed the opioid analgesic properties, but will have lower addiction potential. The models identified several novel potent compounds as activators/inhibitors of OPRs that were confirmed experimentally. This work can open novel ways to address important biomedical and public health needs.