LNCRNAKB: A COMPREHENSIVE KNOWLEDGEBASE OF LONG NON-CODING RNAS

Date

2019

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

High throughput technologies such as next-generation sequencing technologies have allowed the genomic structure to be interrogated at high resolution and scale. That includes long non-coding RNAs (lncRNAs), a class of non-protein-coding transcripts, that range from 200 nucleotides to 100 kb (approximately 10 kb on average). The number of estimated lncRNAs annotations in humans range from 20,000 to 100,000. There are several databases that exist for annotation of human lncRNAs. Most of these databases are available through web-based searchable interfaces. Our objective was to identify current and new lncRNAs databases, download and inspect their latest annotations, integrate this information into a single resource, and create the most comprehensive up- to-date knowledge base that encompasses data from all major resources. Specifically, we provide a “one- stop shop” in which users can search for lncRNAs based on any keywords for e.g. genomic locations, gene names and types. LncRNAs annotations are commonly used as references for quantifying and identifying differentially expressed genes and transcripts in RNA-seq experiments. We used the Genotype Tissue Expression (GTEx) project RNA-seq data to quantify all the lncRNAs in our knowledge base using 9,425 samples sequenced across 31 solid organ human normal tissues. We performed RNA-seq data analysis using a custom pipeline and created a comprehensive tissue- specific expression body map of human lncRNAs. The sequence-function relationship of lncRNAs is not well understood compared to protein-coding genes whose function can be deduced from primary sequence alone. In addition to understanding and improving the annotations of lncRNAs, we sought to predict and determine molecular, biological and disease functions of lncRNAs. We positionally classified and predicted the coding potential of all lncRNAs using a machine learning approach. Using whole genome sequence (WGS) genotype data from the GTEx project we also identified lncRNAs regulated by genetic variants in cis. We performed mRNA-lncRNA co-expression network analysis and identified co-expression gene modules involved in known biological processes thus, deducing the potential function of lncRNAs. Our objective was to functionally annotate and characterize lncRNAs in our knowledge base and provide a comprehensive resource to empower the research community.

Description

Keywords

Citation