P10K 数据库：原生动物 10 万基因组项目的数据门户。

The P10K database: a data portal for the protist 10 000 genomes project.

机构信息

Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Nucleic Acids Res. 2024 Jan 5;52(D1):D747-D755. doi: 10.1093/nar/gkad992.

DOI:10.1093/nar/gkad992

PMID:37930867

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10767852/

Abstract

Protists, a highly diverse group of microscopic eukaryotic organisms distinct from fungi, animals and plants, exert crucial roles within the earth's biosphere. However, the genomes of only a small fraction of known protist species have been published and made publicly accessible. To address this constraint, the Protist 10 000 Genomes Project (P10K) was initiated, implementing a specialized pipeline for single-cell genome/transcriptome assembly, decontamination and annotation of protists. The resultant P10K database (https://ngdc.cncb.ac.cn/p10k/) serves as a comprehensive platform, collating and disseminating genome sequences and annotations from diverse protist groups. Currently, the P10K database has incorporated 2959 genomes and transcriptomes, including 1101 newly sequenced datasets by P10K and 1858 publicly available datasets. Notably, it covers 45% of the protist orders, with a significant representation (53% coverage) of ciliates, featuring nearly a thousand genomes/transcriptomes. Intriguingly, analysis of the unique codon table usage among ciliates has revealed differences compared to the NCBI taxonomy system, suggesting a need to revise the codon tables used for these species. Collectively, the P10K database serves as a valuable repository of genetic resources for protist research and aims to expand its collection by incorporating more sequenced data and advanced analysis tools to benefit protist studies worldwide.

摘要

原生生物是一类与真菌、动物和植物不同的高度多样化的微观真核生物，它们在地球的生物圈中发挥着至关重要的作用。然而，只有一小部分已知的原生生物物种的基因组被发表并公开可用。为了解决这个限制，启动了原生生物 10000 基因组计划（P10K），该计划实施了一个专门的单细胞基因组/转录组组装、净化和原生生物注释的管道。由此产生的 P10K 数据库（https://ngdc.cncb.ac.cn/p10k/）是一个综合平台，汇集和传播来自不同原生生物群体的基因组序列和注释。目前，P10K 数据库已经整合了 2959 个基因组和转录组，其中包括 1101 个由 P10K 新测序的数据集和 1858 个公开可用的数据集。值得注意的是，它涵盖了 45%的原生生物目，其中纤毛类原生生物的代表性很强（覆盖率为 53%），有近千个基因组/转录组。有趣的是，对纤毛类原生生物独特密码子表使用的分析表明，与 NCBI 分类系统相比存在差异，这表明需要修订这些物种使用的密码子表。总的来说，P10K 数据库是原生生物研究的宝贵遗传资源库，旨在通过整合更多的测序数据和先进的分析工具来扩大其收藏，以造福全球的原生生物研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a6c/10767852/3d15fa5d0591/gkad992figgra1.jpg

相似文献

The P10K database: a data portal for the protist 10 000 genomes project.P10K 数据库：原生动物 10 万基因组项目的数据门户。

Nucleic Acids Res. 2024 Jan 5;52(D1):D747-D755. doi: 10.1093/nar/gkad992.

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2024.2024 年中国国家生物信息中心国家基因组学数据中心的数据库资源。

Nucleic Acids Res. 2024 Jan 5;52(D1):D18-D32. doi: 10.1093/nar/gkad1078.

[The mitochondrial genome of protists].[原生生物的线粒体基因组]

Genetika. 2002 Jun;38(6):773-88.

Correction to 'The P10K database: a data portal for the protist 10 000 genomes project'.对《P10K数据库：原生生物一万基因组计划的数据门户》的勘误

Nucleic Acids Res. 2024 Jan 5;52(D1):D1699. doi: 10.1093/nar/gkad1179.

Genome Warehouse: A Public Repository Housing Genome-scale Data.基因组仓库：一个存储基因组规模数据的公共存储库。

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):584-589. doi: 10.1016/j.gpb.2021.04.001. Epub 2021 Jun 24.

Genome structure and gene content in protist mitochondrial DNAs.原生生物线粒体DNA的基因组结构与基因含量

Nucleic Acids Res. 1998 Feb 15;26(4):865-78. doi: 10.1093/nar/26.4.865.

Pathogenic Protist Transmembranome database (PPTdb): a web-based platform for searching and analysis of protist transmembrane proteins.致病原生生物跨膜组数据库（PPTdb）：一个用于搜索和分析原生生物跨膜蛋白的基于网络的平台。

BMC Bioinformatics. 2019 Jul 24;20(Suppl 13):382. doi: 10.1186/s12859-019-2857-7.

CompoDynamics: a comprehensive database for characterizing sequence composition dynamics.CompoDynamics：用于描述序列组成动态的综合数据库。

Nucleic Acids Res. 2022 Jan 7;50(D1):D962-D969. doi: 10.1093/nar/gkab979.

Evolution of the mitochondrial genome: protist connections to animals, fungi and plants.线粒体基因组的演化：原生生物与动物、真菌和植物的联系。

Curr Opin Microbiol. 2004 Oct;7(5):528-34. doi: 10.1016/j.mib.2004.08.008.

Protist.guru: A Comparative Transcriptomics Database for Protists.原生生物数据库：用于原生生物的比较转录组学数据库。

J Mol Biol. 2022 Jun 15;434(11):167502. doi: 10.1016/j.jmb.2022.167502. Epub 2022 Feb 18.

引用本文的文献

The Biogeography of Apicomplexan Parasites in Tropical Soils.热带土壤中顶复门寄生虫的生物地理学

Ecol Evol. 2025 Jun 2;15(6):e71478. doi: 10.1002/ece3.71478. eCollection 2025 Jun.

Foresight 2035: a perspective on the next decade of research on the management of Legionella spp. in engineered aquatic environments.《2035年展望：工程水生环境中军团菌属管理未来十年的研究展望》

FEMS Microbiol Rev. 2025 Jan 14;49. doi: 10.1093/femsre/fuaf022.

The Updated Genome Warehouse: Enhancing Data Value, Security, and Usability to Address Data Expansion.更新后的基因组数据库：提升数据价值、安全性和可用性以应对数据扩展

Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf010.

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2025.2025年国家基因组数据中心（中国国家生物信息中心）的数据库资源

Nucleic Acids Res. 2025 Jan 6;53(D1):D30-D44. doi: 10.1093/nar/gkae978.

ImageGP 2 for enhanced data visualization and reproducible analysis in biomedical research.用于生物医学研究中增强数据可视化和可重复分析的ImageGP 2。

Imeta. 2024 Sep 12;3(5):e239. doi: 10.1002/imt2.239. eCollection 2024 Oct.

The 2024 Nucleic Acids Research database issue and the online molecular biology database collection.2024 年核酸研究数据库问题及在线分子生物学数据库收藏。

Nucleic Acids Res. 2024 Jan 5;52(D1):D1-D9. doi: 10.1093/nar/gkad1173.

本文引用的文献

Nontriplet feature of genetic code in ciliates is a result of neutral evolution.纤毛虫遗传密码的非同三联体特征是中性进化的结果。

Proc Natl Acad Sci U S A. 2023 May 30;120(22):e2221683120. doi: 10.1073/pnas.2221683120. Epub 2023 May 22.

Stop or Not: Genome-Wide Profiling of Reassigned Stop Codons in Ciliates.停或不停：纤毛虫中重排终止密码子的全基因组分析。

Mol Biol Evol. 2023 Apr 4;40(4). doi: 10.1093/molbev/msad064.

iGDP: An integrated genome decontamination pipeline for wild ciliated microeukaryotes.iGDP：一种用于野生纤毛微型真核生物的基因组污染综合去除管道。

Mol Ecol Resour. 2023 Jul;23(5):1182-1193. doi: 10.1111/1755-0998.13782. Epub 2023 Mar 22.

Codetta: predicting the genetic code from nucleotide sequence.科代塔：从核苷酸序列预测遗传密码。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac802.

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023.2023 年中国国家生物信息中心国家基因组学数据中心数据库资源。

Nucleic Acids Res. 2023 Jan 6;51(D1):D18-D28. doi: 10.1093/nar/gkac1073.

VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center.VEuPathDB：真核病原体、载体和宿主生物信息学资源中心。

Nucleic Acids Res. 2022 Jan 7;50(D1):D898-D911. doi: 10.1093/nar/gkab929.

Protist 10,000 Genomes Project.原生生物一万基因组计划

Innovation (Camb). 2020 Nov 7;1(3):100058. doi: 10.1016/j.xinn.2020.100058. eCollection 2020 Nov 25.

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types.基因组序列档案家族：走向爆炸式的数据增长和多样化的数据类型。

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):578-583. doi: 10.1016/j.gpb.2021.08.001. Epub 2021 Aug 13.

BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.BUSCO 更新：用于真核生物、原核生物和病毒基因组评分的新颖且简化的工作流程以及更广泛和更深的系统发育覆盖范围。

Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. doi: 10.1093/molbev/msab199.

Genome Warehouse: A Public Repository Housing Genome-scale Data.基因组仓库：一个存储基因组规模数据的公共存储库。

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):584-589. doi: 10.1016/j.gpb.2021.04.001. Epub 2021 Jun 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

P10K 数据库：原生动物 10 万基因组项目的数据门户。

The P10K database: a data portal for the protist 10 000 genomes project.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献