Suppr超能文献

P10K 数据库:原生动物 10 万基因组项目的数据门户。

The P10K database: a data portal for the protist 10 000 genomes project.

机构信息

Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Nucleic Acids Res. 2024 Jan 5;52(D1):D747-D755. doi: 10.1093/nar/gkad992.

Abstract

Protists, a highly diverse group of microscopic eukaryotic organisms distinct from fungi, animals and plants, exert crucial roles within the earth's biosphere. However, the genomes of only a small fraction of known protist species have been published and made publicly accessible. To address this constraint, the Protist 10 000 Genomes Project (P10K) was initiated, implementing a specialized pipeline for single-cell genome/transcriptome assembly, decontamination and annotation of protists. The resultant P10K database (https://ngdc.cncb.ac.cn/p10k/) serves as a comprehensive platform, collating and disseminating genome sequences and annotations from diverse protist groups. Currently, the P10K database has incorporated 2959 genomes and transcriptomes, including 1101 newly sequenced datasets by P10K and 1858 publicly available datasets. Notably, it covers 45% of the protist orders, with a significant representation (53% coverage) of ciliates, featuring nearly a thousand genomes/transcriptomes. Intriguingly, analysis of the unique codon table usage among ciliates has revealed differences compared to the NCBI taxonomy system, suggesting a need to revise the codon tables used for these species. Collectively, the P10K database serves as a valuable repository of genetic resources for protist research and aims to expand its collection by incorporating more sequenced data and advanced analysis tools to benefit protist studies worldwide.

摘要

原生生物是一类与真菌、动物和植物不同的高度多样化的微观真核生物,它们在地球的生物圈中发挥着至关重要的作用。然而,只有一小部分已知的原生生物物种的基因组被发表并公开可用。为了解决这个限制,启动了原生生物 10000 基因组计划(P10K),该计划实施了一个专门的单细胞基因组/转录组组装、净化和原生生物注释的管道。由此产生的 P10K 数据库(https://ngdc.cncb.ac.cn/p10k/)是一个综合平台,汇集和传播来自不同原生生物群体的基因组序列和注释。目前,P10K 数据库已经整合了 2959 个基因组和转录组,其中包括 1101 个由 P10K 新测序的数据集和 1858 个公开可用的数据集。值得注意的是,它涵盖了 45%的原生生物目,其中纤毛类原生生物的代表性很强(覆盖率为 53%),有近千个基因组/转录组。有趣的是,对纤毛类原生生物独特密码子表使用的分析表明,与 NCBI 分类系统相比存在差异,这表明需要修订这些物种使用的密码子表。总的来说,P10K 数据库是原生生物研究的宝贵遗传资源库,旨在通过整合更多的测序数据和先进的分析工具来扩大其收藏,以造福全球的原生生物研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a6c/10767852/3d15fa5d0591/gkad992figgra1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验