Suppr超能文献

使用PIDE进行高度准确的原噬菌体岛检测。

Highly accurate prophage island detection with PIDE.

作者信息

Gao Hongyan, Li Bowen, Guo Zihan, Zheng Lei, Chen Junnan, Liang Guanxiang

机构信息

Center for Infection Biology, School of Basic Medical Sciences, Tsinghua University, Beijing, 100084, China.

Tsinghua-Peking Joint Center for Life Sciences, Beijing, 100084, China.

出版信息

Genome Biol. 2025 Aug 20;26(1):254. doi: 10.1186/s13059-025-03733-0.

Abstract

As important mobile elements in prokaryotes, prophages shape the genomic context of their hosts and regulate the structure of bacterial populations. However, it is challenging to precisely identify prophages through computational methods. Here, we introduce PIDE for identifying prophages from bacterial genomes or metagenome-assembled genomes. PIDE integrates a pre-trained protein language model and gene density clustering algorithm to distinguish prophages. Benchmarking with induced prophage sequencing datasets demonstrates that PIDE pinpoints prophages with precise boundaries. Applying PIDE to 4744 human gut representative genomes reveals 24,467 prophages with widespread functional capacity. PIDE is available at https://github.com/chyghy/PIDE , with model training code at https://zenodo.org/records/16457629 .

摘要

作为原核生物中重要的移动元件,原噬菌体塑造了其宿主的基因组环境并调节细菌种群的结构。然而,通过计算方法精确识别原噬菌体具有挑战性。在此,我们介绍了用于从细菌基因组或宏基因组组装基因组中识别原噬菌体的PIDE。PIDE整合了预训练的蛋白质语言模型和基因密度聚类算法来区分原噬菌体。使用诱导原噬菌体测序数据集进行基准测试表明,PIDE能够精确界定原噬菌体的边界。将PIDE应用于4744个人类肠道代表性基因组,发现了24467个具有广泛功能能力的原噬菌体。PIDE可在https://github.com/chyghy/PIDE获取,模型训练代码在https://zenodo.org/records/16457629

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验