Machine Learning Models of B-cell and T-cell Epitopes Using Sequence and Structure Information




Sewsankar, Kiran

Journal Title

Journal ISSN

Volume Title



Epitopes, the regions of antigens that are detected by the immune system, have garnered considerable scientific interest in recent years due to their potential for influencing the development of novel medical countermeasures, an example of which, are epitope-based vaccines that can be both safer and more efficacious than those currently available. Innovative vaccines are profoundly needed to keep pace with the ever-changing landscape of global infectious disease. The drive towards creating new vaccines and treatments is critical to preserving world health and will be aided by studying epitopes. Epitopes of the antigen play an important role in immune response. For example, they are recognized by B-cells and T-cells and are the site of antibody binding. Therefore, identifying epitopes can help researchers better understand how foreign disease agents cause illness and how the host immune system reacts against it. Traditional methods for the identification of epitopes centered around experimental structural studies including, X-ray crystallography and NMR techniques, which are time consuming and costly. Thus, bioinformatics and computational approaches have been explored to facilitate the epitope identification process. In this work, models of B-cell and T-cell epitopes were developed to assist in the prediction and classification of non-validated but potential epitope protein sequences. Specifically, machine learning algorithms were trained on a diverse set of epitope/non-epitope representative feature vectors, comprised of sequence derived features based on reduced amino acid alphabets and n-grams and structure derived features based on Delaunay tessellation and amino acid propensity scores to reliably predict epitope sequences and residues. Feature vectors were constructed based on the specific problem at hand, either linear or conformational epitope prediction. The epitope sequence and structure data were obtained from publicly available databases and several machine learning algorithms, including Random Forest, Gaussian Naïve Bayes, and Support Vector Machine were applied to the descriptor space. The best performing epitope prediction models trained here can be used to identify unknown epitope sequences or residues, consequently reducing the search space for candidate epitopes, epitopes that will be the basis for the development of new vaccines and other medical countermeasures. The models are incorporated into the TESSETOPE V1.0, which is a freely available web accessible API for epitope prediction available at



Bioinformatics, Bioinformatics, Epitope, Epitope prediction, Epitope-based vaccines, Immunoinformatics, Machine learning