AniProtDB：一个用于比较基因组学研究的后生动物蛋白质组一致生成集合。

AniProtDB: A Collection of Consistently Generated Metazoan Proteomes for Comparative Genomics Studies.

机构信息

Computational and Statistical Genomics Branch, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

出版信息

Mol Biol Evol. 2021 Sep 27;38(10):4628-4633. doi: 10.1093/molbev/msab165.

DOI:10.1093/molbev/msab165

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8476134/

Abstract

To address the void in the availability of high-quality proteomic data traversing the animal tree, we have implemented a pipeline for generating de novo assemblies based on publicly available data from the NCBI Sequence Read Archive, yielding a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. We have also created the Animal Proteome Database (AniProtDB), a resource providing open access to this collection of high-quality metazoan proteomes, along with information on predicted proteins and protein domains for each taxonomic classification and the ability to perform sequence similarity searches against all proteomes generated using this pipeline. This solution vastly increases the utility of these data by removing the barrier to access for research groups who do not have the expertise or resources to generate these data themselves and enables the use of data from nontraditional research organisms that have the potential to address key questions in biomedicine.

摘要

为了解决动物界中高质量蛋白质组学数据缺乏的问题，我们开发了一个基于 NCBI Sequence Read Archive 中公开数据生成从头组装的流程，生成了涵盖 21 个动物门的 100 个物种的全面蛋白质组数据集。我们还创建了动物蛋白质组数据库（AniProtDB），该资源提供了对这个高质量后生动物蛋白质组集合的开放访问，以及每个分类学分类的预测蛋白和蛋白域信息，以及对使用此流程生成的所有蛋白质组进行序列相似性搜索的能力。通过消除没有生成这些数据专业知识或资源的研究小组访问这些数据的障碍，这个解决方案极大地提高了这些数据的实用性，并使具有解决生物医学关键问题潜力的非传统研究生物的使用成为可能。

相似文献

1

AniProtDB: A Collection of Consistently Generated Metazoan Proteomes for Comparative Genomics Studies.AniProtDB：一个用于比较基因组学研究的后生动物蛋白质组一致生成集合。

Mol Biol Evol. 2021 Sep 27;38(10):4628-4633. doi: 10.1093/molbev/msab165.

2

MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies.MATEdb2，一个高质量后生动物蛋白质组数据库，涵盖动物生命树，以加速系统基因组学研究。

Genome Biol Evol. 2024 Nov 1;16(11). doi: 10.1093/gbe/evae235.

3

Sketched reference databases for genome-based taxonomy and comparative genomics.基于基因组的分类学和比较基因组学的草图参考数据库。

Braz J Biol. 2022 Nov 14;84:e256673. doi: 10.1590/1519-6984.256673. eCollection 2022.

4

3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes.3D基因组学：一个用于比较已测序基因组之间蛋白质的结构和功能注释的数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D245-50. doi: 10.1093/nar/gkh064.

5

EPIC-DB: a proteomics database for studying Apicomplexan organisms.EPIC-DB：一个用于研究顶复门生物的蛋白质组学数据库。

BMC Genomics. 2009 Jan 21;10:38. doi: 10.1186/1471-2164-10-38.

6

MMPdb and MitoPredictor: Tools for facilitating comparative analysis of animal mitochondrial proteomes.MMPdb 和 MitoPredictor：用于促进动物线粒体蛋白质组比较分析的工具。

Mitochondrion. 2020 Mar;51:118-125. doi: 10.1016/j.mito.2020.01.001. Epub 2020 Jan 20.

7

Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment.通过生物信息学实验阐明人类和人科骨骼蛋白质组鉴定中的跨物种蛋白质组效应。

BMC Evol Biol. 2018 Feb 20;18(1):23. doi: 10.1186/s12862-018-1141-1.

8

ProteomeWeb: a web-based interface for the display and interrogation of proteomes.蛋白质组网络：一个基于网络的用于展示和查询蛋白质组的界面。

Proteomics. 2003 May;3(5):584-600. doi: 10.1002/pmic.200300396.

9

MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets.MetaNovo：用于复杂宏蛋白质组学数据中概率肽发现的开源管道。

PLoS Comput Biol. 2023 Jun 16;19(6):e1011163. doi: 10.1371/journal.pcbi.1011163. eCollection 2023 Jun.

10

The Utility of Genomic and Transcriptomic Data in the Construction of Proxy Protein Sequence Databases for Unsequenced Tree Nuts.基因组和转录组数据在构建未测序坚果类替代蛋白质序列数据库中的应用

Biology (Basel). 2020 May 19;9(5):104. doi: 10.3390/biology9050104.

引用本文的文献

1

MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies.MATEdb2，一个高质量后生动物蛋白质组数据库，涵盖动物生命树，以加速系统基因组学研究。

Genome Biol Evol. 2024 Nov 1;16(11). doi: 10.1093/gbe/evae235.

2

LukProt: A Database of Eukaryotic Predicted Proteins Designed for Investigations of Animal Origins.LukProt：一个用于研究动物起源的真核生物预测蛋白数据库。

Genome Biol Evol. 2024 Nov 1;16(11). doi: 10.1093/gbe/evae231.

3

Evolution is not Uniform Along Coding Sequences.进化并非沿编码序列均匀发生。

Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad042.

4

The Elephant Evolved p53 Isoforms that Escape MDM2-Mediated Repression and Cancer.大象进化出了逃避 MDM2 介导的抑制和癌症的 p53 异构体。

Mol Biol Evol. 2022 Jul 2;39(7). doi: 10.1093/molbev/msac149.

本文引用的文献

1

UniProt: the universal protein knowledgebase in 2021.UniProt：2021 年的通用蛋白质知识库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

2

The international nucleotide sequence database collaboration.国际核苷酸序列数据库合作组织。

Nucleic Acids Res. 2021 Jan 8;49(D1):D121-D124. doi: 10.1093/nar/gkaa967.

3

Pfam: The protein families database in 2021.Pfam：2021 年的蛋白质家族数据库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

4

CDD/SPARCLE: the conserved domain database in 2020.CDD/SPARCLE：2020 年的保守结构域数据库。

Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268. doi: 10.1093/nar/gkz991.

5

GenBank.GenBank

Nucleic Acids Res. 2020 Jan 8;48(D1):D84-D86. doi: 10.1093/nar/gkz956.

6

Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。

Nucleic Acids Res. 2020 Jan 8;48(D1):D9-D16. doi: 10.1093/nar/gkz899.

7

Sequenceserver: A Modern Graphical User Interface for Custom BLAST Databases.序列服务器：用于定制 BLAST 数据库的现代图形用户界面。

Mol Biol Evol. 2019 Dec 1;36(12):2922-2924. doi: 10.1093/molbev/msz185.

8

How the evolution of multicellularity set the stage for cancer.多细胞生物的进化如何为癌症奠定了基础。

Br J Cancer. 2018 Jan;118(2):145-152. doi: 10.1038/bjc.2017.398. Epub 2018 Jan 16.

9

BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics.BUSCO的应用：从质量评估到基因预测和系统发育基因组学

Mol Biol Evol. 2018 Mar 1;35(3):543-548. doi: 10.1093/molbev/msx319.

10

To solve old problems, study new research organisms.要解决老问题，研究新的研究生物体。

Dev Biol. 2018 Jan 15;433(2):111-114. doi: 10.1016/j.ydbio.2017.09.018. Epub 2017 Nov 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验