Schmitz Matthias A, Dimonaco Nicholas J, Clavel Thomas, Hitch Thomas C A
Functional Microbiome Research Group, RWTH University Hospital, Aachen, Germany.
Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast, UK.
Nat Commun. 2025 Apr 3;16(1):3204. doi: 10.1038/s41467-025-58442-w.
Microbes use a range of genetic codes and gene structures, yet these are often ignored during metagenomic analysis. This causes spurious protein predictions, preventing functional assignment which limits our understanding of ecosystems. To resolve this, we developed a lineage-specific gene prediction approach that uses the correct genetic code based on the taxonomic assignment of genetic fragments, removes incomplete protein predictions, and optimises prediction of small proteins. Applied to 9634 metagenomes and 3594 genomes from the human gut, this approach increased the landscape of captured expressed microbial proteins by 78.9%, including previously hidden functional groups. Optimised small protein prediction captured 3,772,658 small protein clusters, which form an improved microbial protein catalogue of the human gut (MiProGut). To enable the ecological study of a protein's prevalence and association with host parameters, we developed InvestiGUT, a tool which integrates both the protein sequences and sample metadata. Accurate prediction of proteins is critical to providing a functional understanding of microbiomes, enhancing our ability to study interactions between microbes and hosts.
微生物使用一系列遗传密码和基因结构,但在宏基因组分析过程中这些常常被忽视。这会导致错误的蛋白质预测,阻碍功能分配,从而限制我们对生态系统的理解。为了解决这个问题,我们开发了一种特定谱系的基因预测方法,该方法基于基因片段的分类学归属使用正确的遗传密码,去除不完整的蛋白质预测,并优化小蛋白质的预测。将该方法应用于来自人类肠道的9634个宏基因组和3594个基因组,这一方法使捕获的表达微生物蛋白质的范围增加了78.9%,包括之前隐藏的功能组。优化的小蛋白质预测捕获了3772658个小蛋白质簇,形成了一个改进的人类肠道微生物蛋白质目录(MiProGut)。为了能够对蛋白质的普遍性及其与宿主参数的关联进行生态学研究,我们开发了InvestiGUT,这是一种整合了蛋白质序列和样本元数据的工具。准确的蛋白质预测对于提供对微生物群落的功能理解、增强我们研究微生物与宿主之间相互作用的能力至关重要。