通过 DPCfam 聚类对统一的人类胃肠道蛋白质组进行蛋白质家族注释。

Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering.

机构信息

Area Science Park, Padriciano, 99, 34149, Trieste, Italy.

University of Trieste, Trieste, 34127, Italy.

出版信息

Sci Data. 2024 Jun 1;11(1):568. doi: 10.1038/s41597-024-03131-4.

DOI:10.1038/s41597-024-03131-4

PMID:38824125

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11144186/

Abstract

Technological advances in massively parallel sequencing have led to an exponential growth in the number of known protein sequences. Much of this growth originates from metagenomic projects producing new sequences from environmental and clinical samples. The Unified Human Gastrointestinal Proteome (UHGP) catalogue is one of the most relevant metagenomic datasets with applications ranging from medicine to biology. However, the low levels of sequence annotation may impair its usability. This work aims to produce a family classification of UHGP sequences to facilitate downstream structural and functional annotation. This is achieved through the release of the DPCfam-UHGP50 dataset containing 10,778 putative protein families generated using DPCfam clustering, an unsupervised pipeline grouping sequences into single or multi-domain architectures. DPCfam-UHGP50 considerably improves family coverage at protein and residue levels compared to the manually curated repository Pfam. In the hope that DPCfam-UHGP50 will foster future discoveries in the field of metagenomics of the human gut, we release a FAIR-compliant database of our results that is easily accessible via a searchable web server and Zenodo repository.

摘要

高通量测序技术的进步使得已知蛋白质序列的数量呈指数级增长。其中大部分增长来自于宏基因组项目，这些项目从环境和临床样本中产生新的序列。统一的人类胃肠道蛋白质组 (UHGP) 目录是最相关的宏基因组数据集之一，其应用范围从医学到生物学。然而，序列注释的低水平可能会影响其可用性。这项工作旨在对 UHGP 序列进行家族分类，以促进下游的结构和功能注释。这是通过发布 DPCfam-UHGP50 数据集来实现的，该数据集包含了 10778 个可能的蛋白质家族，这些家族是使用 DPCfam 聚类生成的，这是一个无监督的管道，将序列分为单域或多域结构。与经过人工注释的 Pfam 数据库相比，DPCfam-UHGP50 在蛋白质和残基水平上显著提高了家族覆盖率。我们希望 DPCfam-UHGP50 能够促进人类肠道宏基因组学领域的未来发现，因此我们发布了一个符合 FAIR 原则的数据库，通过可搜索的网络服务器和 Zenodo 存储库可以轻松访问我们的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/029b/11144186/5d2748e03a3f/41597_2024_3131_Fig1_HTML.jpg

相似文献

Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering.

Sci Data. 2024 Jun 1;11(1):568. doi: 10.1038/s41597-024-03131-4.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Direct composite resin fillings versus amalgam fillings for permanent posterior teeth.

Cochrane Database Syst Rev. 2021 Aug 13;8(8):CD005620. doi: 10.1002/14651858.CD005620.pub3.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

Interventions targeted at women to encourage the uptake of cervical screening.

Cochrane Database Syst Rev. 2021 Sep 6;9(9):CD002834. doi: 10.1002/14651858.CD002834.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Carbamazepine versus phenytoin monotherapy for epilepsy: an individual participant data review.

Cochrane Database Syst Rev. 2017 Feb 27;2(2):CD001911. doi: 10.1002/14651858.CD001911.pub3.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

Management of urinary stones by experts in stone disease (ESD 2025).

Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

本文引用的文献

MGnify: the microbiome sequence data analysis resource in 2023.

Nucleic Acids Res. 2023 Jan 6;51(D1):D753-D759. doi: 10.1093/nar/gkac1080.

UniProt: the Universal Protein Knowledgebase in 2023.

Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets.

PLoS Comput Biol. 2022 Oct 19;18(10):e1010610. doi: 10.1371/journal.pcbi.1010610. eCollection 2022 Oct.

Using deep learning to annotate the protein universe.

Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.

Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort.

Nat Genet. 2022 Feb;54(2):134-142. doi: 10.1038/s41588-021-00991-z. Epub 2022 Feb 3.

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation.

BMC Bioinformatics. 2021 Mar 12;22(1):121. doi: 10.1186/s12859-021-04013-x.

Utilizing the gut microbiome in decompensated cirrhosis and acute-on-chronic liver failure.

Nat Rev Gastroenterol Hepatol. 2021 Mar;18(3):167-180. doi: 10.1038/s41575-020-00376-3. Epub 2020 Nov 30.

Pfam: The protein families database in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

A unified catalog of 204,938 reference genomes from the human gut microbiome.

Nat Biotechnol. 2021 Jan;39(1):105-114. doi: 10.1038/s41587-020-0603-3. Epub 2020 Jul 20.

A reference map of the human binary protein interactome.

Nature. 2020 Apr;580(7803):402-408. doi: 10.1038/s41586-020-2188-x. Epub 2020 Apr 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过 DPCfam 聚类对统一的人类胃肠道蛋白质组进行蛋白质家族注释。

Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering.

机构信息

Area Science Park, Padriciano, 99, 34149, Trieste, Italy.

University of Trieste, Trieste, 34127, Italy.

出版信息

Sci Data. 2024 Jun 1;11(1):568. doi: 10.1038/s41597-024-03131-4.

DOI:10.1038/s41597-024-03131-4

PMID:38824125

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11144186/

Abstract

摘要

通过 DPCfam 聚类对统一的人类胃肠道蛋白质组进行蛋白质家族注释。

Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过 DPCfam 聚类对统一的人类胃肠道蛋白质组进行蛋白质家族注释。

Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering.

机构信息

出版信息

相似文献

本文引用的文献