Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Nucleic Acids Res. 2024 Jan 5;52(D1):D747-D755. doi: 10.1093/nar/gkad992.
Protists, a highly diverse group of microscopic eukaryotic organisms distinct from fungi, animals and plants, exert crucial roles within the earth's biosphere. However, the genomes of only a small fraction of known protist species have been published and made publicly accessible. To address this constraint, the Protist 10 000 Genomes Project (P10K) was initiated, implementing a specialized pipeline for single-cell genome/transcriptome assembly, decontamination and annotation of protists. The resultant P10K database (https://ngdc.cncb.ac.cn/p10k/) serves as a comprehensive platform, collating and disseminating genome sequences and annotations from diverse protist groups. Currently, the P10K database has incorporated 2959 genomes and transcriptomes, including 1101 newly sequenced datasets by P10K and 1858 publicly available datasets. Notably, it covers 45% of the protist orders, with a significant representation (53% coverage) of ciliates, featuring nearly a thousand genomes/transcriptomes. Intriguingly, analysis of the unique codon table usage among ciliates has revealed differences compared to the NCBI taxonomy system, suggesting a need to revise the codon tables used for these species. Collectively, the P10K database serves as a valuable repository of genetic resources for protist research and aims to expand its collection by incorporating more sequenced data and advanced analysis tools to benefit protist studies worldwide.
原生生物是一类与真菌、动物和植物不同的高度多样化的微观真核生物,它们在地球的生物圈中发挥着至关重要的作用。然而,只有一小部分已知的原生生物物种的基因组被发表并公开可用。为了解决这个限制,启动了原生生物 10000 基因组计划(P10K),该计划实施了一个专门的单细胞基因组/转录组组装、净化和原生生物注释的管道。由此产生的 P10K 数据库(https://ngdc.cncb.ac.cn/p10k/)是一个综合平台,汇集和传播来自不同原生生物群体的基因组序列和注释。目前,P10K 数据库已经整合了 2959 个基因组和转录组,其中包括 1101 个由 P10K 新测序的数据集和 1858 个公开可用的数据集。值得注意的是,它涵盖了 45%的原生生物目,其中纤毛类原生生物的代表性很强(覆盖率为 53%),有近千个基因组/转录组。有趣的是,对纤毛类原生生物独特密码子表使用的分析表明,与 NCBI 分类系统相比存在差异,这表明需要修订这些物种使用的密码子表。总的来说,P10K 数据库是原生生物研究的宝贵遗传资源库,旨在通过整合更多的测序数据和先进的分析工具来扩大其收藏,以造福全球的原生生物研究。