Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Informatics and Communications Technology Complex, 535 W Michigan St., IT 475H, Indianapolis, IN, 46202, USA.
Department of Surgery, Indiana Center for Regenerative Medicine and Engineering (ICRME), Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
BMC Bioinformatics. 2021 May 26;22(1):279. doi: 10.1186/s12859-021-04207-3.
With advancements in omics technologies, the range of biological processes where long non-coding RNAs (lncRNAs) are involved, is expanding extensively, thereby generating the need to develop lncRNA annotation resources. Although, there are a plethora of resources for annotating genes, despite the extensive corpus of lncRNA literature, the available resources with lncRNA ontology annotations are rare.
We present a lncRNA annotation extractor and repository (Lantern), developed using PubMed's abstract retrieval engine and NCBO's recommender annotation system. Lantern's annotations were benchmarked against lncRNAdb's manually curated free text. Benchmarking analysis suggested that Lantern has a recall of 0.62 against lncRNAdb for 182 lncRNAs and precision of 0.8. Additionally, we also annotated lncRNAs with multiple omics annotations, including predicted cis-regulatory TFs, interactions with RBPs, tissue-specific expression profiles, protein co-expression networks, coding potential, sub-cellular localization, and SNPs for ~ 11,000 lncRNAs in the human genome, providing a one-stop dynamic visualization platform.
Lantern integrates a novel, accurate semi-automatic ontology annotation engine derived annotations combined with a variety of multi-omics annotations for lncRNAs, to provide a central web resource for dissecting the functional dynamics of long non-coding RNAs and to facilitate future hypothesis-driven experiments. The annotation pipeline and a web resource with current annotations for human lncRNAs are freely available on sysbio.lab.iupui.edu/lantern.
随着组学技术的进步,长非编码 RNA(lncRNA)参与的生物过程范围正在广泛扩展,从而产生了开发 lncRNA 注释资源的需求。尽管有大量用于注释基因的资源,但尽管有大量的 lncRNA 文献,具有 lncRNA 本体论注释的可用资源却很少。
我们使用 PubMed 的摘要检索引擎和 NCBO 的推荐注释系统开发了一个 lncRNA 注释提取器和存储库(Lantern)。Lantern 的注释与 lncRNAdb 的手动注释进行了基准测试。基准测试分析表明,Lantern 在 182 个 lncRNA 上对 lncRNAdb 的召回率为 0.62,精度为 0.8。此外,我们还对具有多种组学注释的 lncRNAs 进行了注释,包括预测的顺式调控 TF、与 RBPs 的相互作用、组织特异性表达谱、蛋白质共表达网络、编码潜力、亚细胞定位和 SNPs 等。~11,000 个人类基因组中的 lncRNA,提供了一个一站式动态可视化平台。
Lantern 集成了一种新颖、准确的半自动本体论注释引擎,结合了多种多组学注释,为解析长非编码 RNA 的功能动态提供了一个中央网络资源,并为未来的假设驱动实验提供了便利。该注释管道和包含当前人类 lncRNA 注释的网络资源可在 sysbio.lab.iupui.edu/lantern 上免费获取。