Suppr超能文献

哺乳动物注释数据库,用于改进来自注释较少的生物体的组学数据集的注释和功能分类。

Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms.

机构信息

Animal Physiology, Institute of Agricultural Sciences, ETH Zurich, Zurich, Switzerland.

Genetics and Functional Genomics, Vetsuisse Faculty Zurich, University of Zurich, Zurich, Switzerland.

出版信息

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz086.

Abstract

Next-generation sequencing technologies and the availability of an increasing number of mammalian and other genomes allow gene expression studies, particularly RNA sequencing, in many non-model organisms. However, incomplete genome annotation and assignments of genes to functional annotation databases can lead to a substantial loss of information in downstream data analysis. To overcome this, we developed Mammalian Annotation Database tool (MAdb, https://madb.ethz.ch) to conveniently provide homologous gene information for selected mammalian species. The assignment between species is performed in three steps: (i) matching official gene symbols, (ii) using ortholog information contained in Ensembl Compara and (iii) pairwise BLAST comparisons of all transcripts. In addition, we developed a new tool (AnnOverlappeR) for the reliable assignment of the National Center for Biotechnology Information (NCBI) and Ensembl gene IDs. The gene lists translated to gene IDs of well-annotated species such as a human can be used for improved functional annotation with relevant tools based on Gene Ontology and molecular pathway information. We tested the MAdb on a published RNA-seq data set for the pig and showed clearly improved overrepresentation analysis results based on the assigned human homologous gene identifiers. Using the MAdb revealed a similar list of human homologous genes and functional annotation results regardless of whether starting with gene IDs from NCBI or Ensembl. The MAdb database is accessible via a web interface and a Galaxy application.

摘要

下一代测序技术和越来越多的哺乳动物及其他基因组的可用性使得许多非模式生物能够进行基因表达研究,特别是 RNA 测序。然而,不完全的基因组注释以及将基因分配到功能注释数据库中,可能会导致下游数据分析中大量信息的丢失。为了克服这一问题,我们开发了哺乳动物注释数据库工具(MAdb,https://madb.ethz.ch),以便方便地为选定的哺乳动物物种提供同源基因信息。物种之间的分配分三个步骤进行:(i)匹配官方基因符号,(ii)使用 Ensembl Compara 中包含的直系同源信息,(iii)所有转录本的两两 BLAST 比较。此外,我们开发了一种新工具(AnnOverlappeR),用于可靠地分配美国国家生物技术信息中心(NCBI)和 Ensembl 的基因 ID。将翻译成注释良好的物种(如人类)的基因 ID 的基因列表,可以用于使用基于基因本体论和分子途径信息的相关工具进行改进的功能注释。我们在发表的猪 RNA-seq 数据集上测试了 MAdb,并基于分配的人类同源基因标识符显示了明显改进的过表达分析结果。使用 MAdb 可以得到类似的人类同源基因列表和功能注释结果,无论从 NCBI 还是 Ensembl 开始,使用的都是基因 ID。MAdb 数据库可通过网络界面和 Galaxy 应用程序访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2621/6661403/bf27c4b10d92/baz086f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验