Classification and Prediction of Antimicrobial Peptides Using N-gram Representation and Machine Learning

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Current antibiotic treatments for infectious diseases are rapidly losing effectiveness, as the organisms they target are developing drug resistance over time. In the United States alone antibiotic-resistant bacterial infections annually result in more than 35,000 deaths, and much higher morbidity rates A promising alternative to the current antibiotic treatments is antimicrobial peptides (AMPs), short strings of amino acid residues that are able to inhibit the propagation of pathogens. A problem of correctly identifying AMPs based on their sequence features remains a subject of active investigations. In this dissertation, we successfully explored many features of AMP sequences using reduced amino acid alphabets and machine learning algorithms. Sequence patterns and sequence composition were represented by vectors of N-gram frequencies, where N-grams are substrings of length N. Machine learning (ML) models were used to differentiate between AMPs and non-AMPs, and to classify AMPs based on target pathogen class. These models demonstrated performance comparable or exceeding many states of the art models based on more complex peptide descriptors. Peptide representation based on reduced alphabets and N-gram frequencies can be used for design of novel AMP for targeting specific pathogens, which may provide a potential pathway for alternatives to antibiotic treatments. This work opens opportunities for collaboration with the wet lab researchers who can test the designed AMPs in experimental setting. N-gram a new publicly available application created for the peptide representation using N-grams and reduced amino acid alphabets is available at http://www.binf.gmu.edu/mothman/N-gram-Classification-Application/

Description

Keywords

Citation