Network Analysis of Correlated Mutations in Influenza

Date

2017

Authors

Yallapragada, Uday B

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Influenza A Virus (IAV) is remarkably adept at surviving in human populations. IAV thrives even among populations with wide spread access to vaccines and anti-viral drugs, and continues to be a major cause of morbidity and mortality. Correlated mutations are an important factor in IAV’s evolution and are critical for host adaptation and pathogenicity. Large sets of publicly available sequences of IAV combined with its rapid and complex evolutionary dynamics present interesting opportunities and unique challenges to analyze correlated mutations in influenza proteomes. In this work, we performed a comprehensive analysis of correlated mutations in IAV using a network theory approach where residues in each protein act as nodes in the graph and edges in the graph are created based on inter-residue correlated mutations. Our approach used ‘maximal information coefficient’ (MIC) to compute correlations between residues and we created edges between nodes if MIC exceeds a threshold. We created a modular and robust pipeline and applied it to multiple datasets belonging to H1N1, H3N2, H5 and H7N9 subtypes. We studied structural dynamics of IAV sub-systems based on topological properties of their networks resulting in several important conclusions. We identified nodes with highest degree along with edges and triplets with strongest weight for each network. To contextualize our results, we performed entropy analysis to gain a global view of sequence variation and computed solvent accessibility profiles to identify statistical differences in correlation profiles between surface and buried residues. We computed residue cooccurrence counts to understand the internal mechanics behind MIC. Additionally, we applied our pipeline to gradually increasing datasets of human H1N1 and human H3N2 over the past 10 years and elucidated their evolutionary patterns. As part of our overall pipeline, we took specific measures to eliminate phylogenetic and stochastic background noise. We created a web application to allow users to comprehend results of our analysis and to search for correlated mutations.

Description

Keywords

Bioinformatics, Biology, Computer science, Classification of Influenza Sequences, Correlated Mutations, Influenza, Network Analysis, Ngrams, Proteins

Citation