Suppr超能文献

CHM13-T2T基因组的使用通过最大限度减少宿主DNA污染改进了宏基因组分析。

Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination.

作者信息

Liu Donglai, Hu Jinjun, Zhang Dan, Ren Shanshan, Zhao Lanqing, Gao Hongyan, Hu Songnian, Xu Sihong, Liang Guanxiang

机构信息

National Institutes for Food and Drug Control, Beijing, China.

Center for Infection Biology, School of Basic Medical Sciences, Tsinghua University, Beijing, China.

出版信息

mSystems. 2025 Sep 10:e0084025. doi: 10.1128/msystems.00840-25.

Abstract

Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.

摘要

与人类相关的宏基因组数据通常包含人类核酸信息,这可能会影响微生物分类的准确性或引发伦理问题。这些 reads 通常通过使用各种宏基因组映射工具或人类参考基因组与人类基因组进行比对来去除,然后在宏基因组分析之前进行过滤。在本研究中,我们使用基准数据进行了全面分析,以确定比对软件和人类参考基因组的最佳组合。我们的研究结果表明,bwa-mem 和端粒到端粒的人类基因组(CHM13-T2T)的组合在去除模拟数据中的人类 reads 方面最为有效。我们还分析了 RefSeq 中源自 CHM13-T2T 的序列,以了解 CHM13-T2T 如何减少假阳性结果。最后,我们评估了临床样本,发现 CHM13-T2T 有效地减少了宿主来源的污染,特别是在低微生物生物量样本中。本研究全面概述了 CHM13-T2T 在宏基因组分析中的应用,并强调了其在提高微生物分类准确性方面的重要性。重要性人类基因序列在宏基因组序列中占很大比例。为了获得准确和精确的微生物组信息,需要有效的宿主来源污染去除方法。比对算法和参考基因组都可能影响这一过程的有效性。端粒到端粒的人类基因组(CHM13-T2T)是一种先进的人类基因组,与常用的 GRCh38.p14 相比,有 216 Mbp 的额外新序列。我们的研究结果表明,CHM13-T2T 与 bwa-mem 软件结合在宏基因组分析中具有最佳的去宿主效果。我们还研究了 CHM13-T2T 优越性的原因。我们的研究为从宏基因组数据中去除宿主序列的最佳策略提供了见解。为未来的宏基因组分析提出了一个标准参考,这可以提高微生物鉴定的准确性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验