• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结核分枝杆菌测序的生物信息学分析流程,可清除痰样本中的污染读取。

A bioinformatics pipeline for Mycobacterium tuberculosis sequencing that cleans contaminant reads from sputum samples.

机构信息

Laboratorio de Farmacogenómica, Instituto Nacional de Medicina Genómica (INMEGEN), Ciudad de México, México.

Instituto de Investigaciones Biológicas, Universidad Veracruzana, Xalapa, Veracruz, México.

出版信息

PLoS One. 2021 Oct 26;16(10):e0258774. doi: 10.1371/journal.pone.0258774. eCollection 2021.

DOI:10.1371/journal.pone.0258774
PMID:34699523
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8547644/
Abstract

Next-Generation Sequencing (NGS) is widely used to investigate genomic variation. In several studies, the genetic variation of Mycobacterium tuberculosis has been analyzed in sputum samples without previous culture, using target enrichment methodologies for NGS. Alignments obtained by different programs generally map the sequences under default parameters, and from these results, it is assumed that only Mycobacterium reads will be obtained. However, variants of interest microorganism in clinical samples can be confused with a vast collection of reads from other bacteria, viruses, and human DNA. Currently, there are no standardized pipelines, and the cleaning success is never verified since there is a lack of rigorous controls to identify and remove reads from other sputum-microorganisms genetically similar to M. tuberculosis. Therefore, we designed a bioinformatic pipeline to process NGS data from sputum samples, including several filters and quality control points to identify and eliminate non-M. tuberculosis reads to obtain a reliable genetic variant report. Our proposal uses the SURPI software as a taxonomic classifier to filter input sequences and perform a mapping that provides the highest percentage of Mycobacterium reads, minimizing the reads from other microorganisms. We then use the filtered sequences to perform variant calling with the GATK software, ensuring the mapping quality, realignment, recalibration, hard-filtering, and post-filter to increase the reliability of the reported variants. Using default mapping parameters, we identified reads of contaminant bacteria, such as Streptococcus, Rhotia, Actinomyces, and Veillonella. Our final mapping strategy allowed a sequence identity of 97.8% between the input reads and the whole M. tuberculosis reference genome H37Rv using a genomic edit distance of three, thus removing 98.8% of the off-target sequences with a Mycobacterium reads loss of 1.7%. Finally, more than 200 unreliable genetic variants were removed during the variant calling, increasing the report's reliability.

摘要

下一代测序(NGS)广泛用于研究基因组变异。在几项研究中,使用靶向富集方法对 NGS 对未经培养的痰液样本中的结核分枝杆菌的遗传变异进行了分析。不同程序获得的比对通常在默认参数下对序列进行映射,并且根据这些结果,假设只会获得结核分枝杆菌的读段。然而,临床样本中感兴趣的变异微生物可能会与大量来自其他细菌、病毒和人类 DNA 的读段混淆。目前,没有标准化的流程,并且由于缺乏严格的控制来识别和去除与结核分枝杆菌在遗传上相似的其他痰液微生物的读段,因此无法验证清洗的成功。因此,我们设计了一个生物信息学流程来处理痰液样本的 NGS 数据,其中包括几个过滤器和质量控制点,以识别和消除非结核分枝杆菌的读段,从而获得可靠的遗传变异报告。我们的建议使用 SURPI 软件作为分类器来过滤输入序列,并进行映射,提供最高百分比的结核分枝杆菌读段,最大限度地减少其他微生物的读段。然后,我们使用过滤后的序列使用 GATK 软件进行变异调用,确保映射质量、重新比对、重新校准、硬过滤和后过滤,以提高报告变异的可靠性。使用默认的映射参数,我们鉴定了污染细菌的读段,如链球菌、Rhotia、放线菌和韦荣球菌。我们的最终映射策略允许输入读段与整个结核分枝杆菌参考基因组 H37Rv 之间的序列同一性为 97.8%,使用基因组编辑距离为 3,从而去除了 98.8%的非目标序列,结核分枝杆菌读段损失了 1.7%。最后,在变异调用过程中去除了 200 多个不可靠的遗传变异,提高了报告的可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/4f27b9cb64c5/pone.0258774.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/ad6a5ed12cbc/pone.0258774.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/44a4dc404758/pone.0258774.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/bb0101466a5b/pone.0258774.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/5bbeae35c89f/pone.0258774.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/b0d69bf2bbd5/pone.0258774.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/4f27b9cb64c5/pone.0258774.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/ad6a5ed12cbc/pone.0258774.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/44a4dc404758/pone.0258774.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/bb0101466a5b/pone.0258774.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/5bbeae35c89f/pone.0258774.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/b0d69bf2bbd5/pone.0258774.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135d/8547644/4f27b9cb64c5/pone.0258774.g006.jpg

相似文献

1
A bioinformatics pipeline for Mycobacterium tuberculosis sequencing that cleans contaminant reads from sputum samples.结核分枝杆菌测序的生物信息学分析流程,可清除痰样本中的污染读取。
PLoS One. 2021 Oct 26;16(10):e0258774. doi: 10.1371/journal.pone.0258774. eCollection 2021.
2
Control of Artifactual Variation in Reported Intersample Relatedness during Clinical Use of a Mycobacterium tuberculosis Sequencing Pipeline.临床应用结核分枝杆菌测序流程时报告样本间相关性的人为变异控制。
J Clin Microbiol. 2018 Jul 26;56(8). doi: 10.1128/JCM.00104-18. Print 2018 Aug.
3
Detection of Minority Variants and Mixed Infections in Mycobacterium tuberculosis by Direct Whole-Genome Sequencing on Noncultured Specimens Using a Specific-DNA Capture Strategy.采用特定 DNA 捕获策略对未经培养标本进行直接全基因组测序检测结核分枝杆菌中的少数变异体和混合感染。
mSphere. 2021 Dec 22;6(6):e0074421. doi: 10.1128/mSphere.00744-21. Epub 2021 Dec 15.
4
Identifying Mixed Mycobacterium tuberculosis Infection and Laboratory Cross-Contamination during Mycobacterial Sequencing Programs.鉴定分枝杆菌测序项目中的结核分枝杆菌混合感染和实验室交叉污染。
J Clin Microbiol. 2018 Oct 25;56(11). doi: 10.1128/JCM.00923-18. Print 2018 Nov.
5
Read trimming has minimal effect on bacterial SNP-calling accuracy.reads 修剪对细菌 SNP 调用准确性的影响最小。
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.
6
Genetic diversity within diagnostic sputum samples is mirrored in the culture of Mycobacterium tuberculosis across different settings.在不同环境中,诊断性痰样本中的遗传多样性反映在结核分枝杆菌的培养中。
Nat Commun. 2024 Sep 5;15(1):7114. doi: 10.1038/s41467-024-51266-0.
7
NGSReadsTreatment - A Cuckoo Filter-based Tool for Removing Duplicate Reads in NGS Data.NGSReadsTreatment - 一种基于布谷鸟过滤器的工具,用于去除 NGS 数据中的重复读取。
Sci Rep. 2019 Aug 12;9(1):11681. doi: 10.1038/s41598-019-48242-w.
8
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data.一种优化的基因组 VCF 工作流程,用于从跨平台全基因组测序数据中精确鉴定结核分枝杆菌簇。
Infect Genet Evol. 2020 Apr;79:104152. doi: 10.1016/j.meegid.2019.104152. Epub 2019 Dec 24.
9
Validation and assessment of variant calling pipelines for next-generation sequencing.下一代测序变异检测流程的验证与评估
Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.
10
Whole genome sequencing reveals mycobacterial microevolution among concurrent isolates from sputum and blood in HIV infected TB patients.全基因组测序揭示了HIV感染的结核病患者痰液和血液中同时分离出的分枝杆菌的微观进化。
BMC Infect Dis. 2016 Aug 5;16:371. doi: 10.1186/s12879-016-1737-2.

引用本文的文献

1
Whole genome sequence of petroleum hydrocarbon degrading novel strain Microbacter sp. EMBS2025 isolated from Chilika Lake, Odisha, India.从印度奥里萨邦奇利卡湖分离出的石油烃降解新菌株微杆菌属EMBS2025的全基因组序列。
Sci Rep. 2025 Jul 31;15(1):27961. doi: 10.1038/s41598-025-13545-8.
2
Targeted next-generation sequencing to diagnose drug-resistant tuberculosis: a systematic review and meta-analysis.靶向二代测序诊断耐多药结核病:一项系统评价与荟萃分析
Lancet Infect Dis. 2024 Oct;24(10):1162-1176. doi: 10.1016/S1473-3099(24)00263-9. Epub 2024 May 22.
3
Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data.

本文引用的文献

1
Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability.细菌测序实验中的污染物 DNA 是虚假遗传变异的主要来源。
BMC Biol. 2020 Mar 2;18(1):24. doi: 10.1186/s12915-020-0748-z.
2
Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases.迈向标准化:五种全基因组测序 (WGS) 分析管道在检测流行病学关联结核病例中的比较。
Euro Surveill. 2019 Dec;24(50). doi: 10.2807/1560-7917.ES.2019.24.50.1900130.
3
The respiratory microbiota: new insights into pulmonary tuberculosis.
泛基因组数据库可提高从临床宏基因组数据中去除宿主和分枝杆菌分类的能力。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae010.
4
The MAGMA pipeline for comprehensive genomic analyses of clinical Mycobacterium tuberculosis samples.MAGMA 管道用于对临床分枝杆菌结核样本进行全面的基因组分析。
PLoS Comput Biol. 2023 Nov 29;19(11):e1011648. doi: 10.1371/journal.pcbi.1011648. eCollection 2023 Nov.
5
The Role of Next-Generation Sequencing (NGS) in the Management of Tuberculosis: Practical Review for Implementation in Routine.下一代测序(NGS)在结核病管理中的作用:常规实施的实用综述
Pathogens. 2023 Jul 26;12(8):978. doi: 10.3390/pathogens12080978.
6
Rapid Identification of Drug Resistance and Phylogeny in M. tuberculosis, Directly from Sputum Samples.从痰样本中直接快速鉴定结核分枝杆菌的耐药性和系统发育。
Microbiol Spectr. 2022 Oct 26;10(5):e0125222. doi: 10.1128/spectrum.01252-22. Epub 2022 Sep 14.
呼吸微生物群:肺结核的新见解。
BMC Infect Dis. 2019 Jan 25;19(1):92. doi: 10.1186/s12879-019-3712-1.
4
Direct Whole-Genome Sequencing of Sputum Accurately Identifies Drug-Resistant Mycobacterium tuberculosis Faster than MGIT Culture Sequencing.直接全基因组测序痰液比 MGIT 培养测序更准确地鉴定耐药结核分枝杆菌。
J Clin Microbiol. 2018 Jul 26;56(8). doi: 10.1128/JCM.00666-18. Print 2018 Aug.
5
Genotypic drug resistance using whole-genome sequencing of Mycobacterium tuberculosis clinical isolates from North-western Tanzania.利用来自坦桑尼亚西北部结核分枝杆菌临床分离株的全基因组测序进行基因型耐药性分析。
Tuberculosis (Edinb). 2018 Mar;109:97-101. doi: 10.1016/j.tube.2018.02.004. Epub 2018 Feb 21.
6
Meta-analysis of the lung microbiota in pulmonary tuberculosis.肺结核中肺部微生物群的荟萃分析。
Tuberculosis (Edinb). 2018 Mar;109:102-108. doi: 10.1016/j.tube.2018.02.006. Epub 2018 Feb 22.
7
Careful use of 16S rRNA gene sequence similarity values for the identification of species.谨慎使用16S rRNA基因序列相似性值来鉴定物种。
New Microbes New Infect. 2017 Dec 29;22:24-29. doi: 10.1016/j.nmni.2017.12.009. eCollection 2018 Mar.
8
Mycobacterium tuberculosis resistance prediction and lineage classification from genome sequencing: comparison of automated analysis tools.结核分枝杆菌耐药性预测和基因组测序谱系分类:自动化分析工具比较。
Sci Rep. 2017 Apr 20;7:46327. doi: 10.1038/srep46327.
9
Removing the bottleneck in whole genome sequencing of Mycobacterium tuberculosis for rapid drug resistance analysis: a call to action.消除结核分枝杆菌全基因组测序中的瓶颈以进行快速耐药性分析:行动呼吁
Int J Infect Dis. 2017 Mar;56:130-135. doi: 10.1016/j.ijid.2016.11.422. Epub 2016 Dec 13.
10
Rapid Drug Susceptibility Testing of Drug-Resistant Mycobacterium tuberculosis Isolates Directly from Clinical Samples by Use of Amplicon Sequencing: a Proof-of-Concept Study.通过扩增子测序直接从临床样本中对耐药结核分枝杆菌分离株进行快速药敏试验:一项概念验证研究。
J Clin Microbiol. 2016 Aug;54(8):2058-67. doi: 10.1128/JCM.00535-16. Epub 2016 May 25.