Computational and Statistical Genomics Branch, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Mol Biol Evol. 2021 Sep 27;38(10):4628-4633. doi: 10.1093/molbev/msab165.
To address the void in the availability of high-quality proteomic data traversing the animal tree, we have implemented a pipeline for generating de novo assemblies based on publicly available data from the NCBI Sequence Read Archive, yielding a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. We have also created the Animal Proteome Database (AniProtDB), a resource providing open access to this collection of high-quality metazoan proteomes, along with information on predicted proteins and protein domains for each taxonomic classification and the ability to perform sequence similarity searches against all proteomes generated using this pipeline. This solution vastly increases the utility of these data by removing the barrier to access for research groups who do not have the expertise or resources to generate these data themselves and enables the use of data from nontraditional research organisms that have the potential to address key questions in biomedicine.
为了解决动物界中高质量蛋白质组学数据缺乏的问题,我们开发了一个基于 NCBI Sequence Read Archive 中公开数据生成从头组装的流程,生成了涵盖 21 个动物门的 100 个物种的全面蛋白质组数据集。我们还创建了动物蛋白质组数据库(AniProtDB),该资源提供了对这个高质量后生动物蛋白质组集合的开放访问,以及每个分类学分类的预测蛋白和蛋白域信息,以及对使用此流程生成的所有蛋白质组进行序列相似性搜索的能力。通过消除没有生成这些数据专业知识或资源的研究小组访问这些数据的障碍,这个解决方案极大地提高了这些数据的实用性,并使具有解决生物医学关键问题潜力的非传统研究生物的使用成为可能。