Suppr超能文献

从污染和低覆盖度全基因组测序数据中全面准确地识别遗传变异。

Comprehensive and accurate genetic variant identification from contaminated and low-coverage whole genome sequencing data.

机构信息

Family Medicine and Population Health (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium.

South African Medical Research Council Centre for Tuberculosis Research and DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Stellenbosch University, Stellenbosch, South Africa.

出版信息

Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000689.

Abstract

Improved understanding of the genomic variants that allow () to acquire drug resistance, or tolerance, and increase its virulence are important factors in controlling the current tuberculosis epidemic. Current approaches to sequencing, however, cannot reveal ’s full genomic diversity due to the strict requirements of low contamination levels, high sequence coverage and elimination of complex regions. We have developed the XBS (compleX Bacterial Samples) bioinformatics pipeline, which implements joint calling and machine-learning-based variant filtering tools to specifically improve variant detection in the important samples that do not meet these criteria, such as those from unbiased sputum samples. Using novel simulated datasets, which permit exact accuracy verification, XBS was compared to the UVP and MTBseq pipelines. Accuracy statistics showed that all three pipelines performed equally well for sequence data that resemble those obtained from culture isolates of high depth of coverage and low-level contamination. In the complex genomic regions, however, XBS accurately identified 9.0 % more SNPs and 8.1 % more single nucleotide insertions and deletions than the WHO-endorsed unified analysis variant pipeline. XBS also had superior accuracy for sequence data that resemble those obtained directly from sputum samples, where depth of coverage is typically very low and contamination levels are high. XBS was the only pipeline not affected by low depth of coverage (5–10×), type of contamination and excessive contamination levels (>50 %). Simulation results were confirmed using whole genome sequencing (WGS) data from clinical samples, confirming the superior performance of XBS with a higher sensitivity (98.8%) when analysing culture isolates and identification of 13.9 % more variable sites in WGS data from sputum samples as compared to MTBseq, without evidence for false positive variants when rRNA regions were excluded. The XBS pipeline facilitates sequencing of less-than-perfect samples. These advances will benefit future clinical applications of sequencing, especially WGS directly from clinical specimens, thereby avoiding biases and making many more samples available for drug resistance and other genomic analyses. The additional genetic resolution and increased sample success rate will improve genome-wide association studies and sequence-based transmission studies.

摘要

提高对允许结核分枝杆菌获得耐药性或耐受性的基因组变异的理解,以及增加其毒力,是控制当前结核病流行的重要因素。然而,目前的测序方法由于对低污染水平、高序列覆盖度和复杂区域消除的严格要求,无法揭示结核分枝杆菌的全部基因组多样性。我们开发了 XBS(复杂细菌样本)生物信息学管道,该管道实施联合调用和基于机器学习的变异过滤工具,专门提高不符合这些标准的重要样本(如非选择性痰样本)中的变异检测。使用新型模拟数据集,可以进行精确的准确性验证,将 XBS 与 UVP 和 MTBseq 管道进行了比较。准确性统计显示,对于类似于从高深度覆盖和低水平污染的培养分离物获得的序列数据,所有三个管道的性能都相同。然而,在复杂基因组区域,XBS 准确地识别出比世界卫生组织认可的统一分析变异管道多 9.0%的 SNPs 和多 8.1%的单核苷酸插入和缺失。XBS 还具有优于类似于直接从痰样本获得的序列数据的准确性,其中深度覆盖通常非常低,污染水平非常高。XBS 是唯一不受低深度覆盖(5-10×)、污染类型和高污染水平(>50%)影响的管道。使用来自临床样本的全基因组测序(WGS)数据验证了模拟结果,证实了 XBS 的卓越性能,在分析培养分离物时,敏感性为 98.8%,在痰样本的 WGS 数据中鉴定出 13.9%更多的可变位点,并且在排除 rRNA 区域时,没有假阳性变体的证据。XBS 管道有助于对不完美的样本进行测序。这些进展将有利于未来的临床测序应用,特别是直接从临床标本进行 WGS,从而避免偏倚并使更多的样本可用于耐药性和其他基因组分析。额外的遗传分辨率和增加的样本成功率将改善全基因组关联研究和基于序列的传播研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f43/8743552/c2e02387c255/mgen-7-0689-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验