Delhomme Tiffany M, Avogbe Patrice H, Gabriel Aurélie A G, Alcala Nicolas, Leblay Noemie, Voegele Catherine, Vallée Maxime, Chopard Priscilia, Chabrier Amélie, Abedi-Ardekani Behnoush, Gaborieau Valérie, Holcatova Ivana, Janout Vladimir, Foretová Lenka, Milosavljevic Sasa, Zaridze David, Mukeriya Anush, Brambilla Elisabeth, Brennan Paul, Scelo Ghislaine, Fernandez-Cuesta Lynnette, Byrnes Graham, Calvez-Kelm Florence L, McKay James D, Foll Matthieu
Genetic Cancer Susceptibility Group, Section of Genetics, International Agency for Research on Cancer (IARC-WHO), 150 cours Albert Thomas, 69008 Lyon, France.
Genetic Epidemiology Group, Section of Genetics, International Agency for Research on Cancer (IARC-WHO), 150 cours Albert Thomas, 69008 Lyon, France.
NAR Genom Bioinform. 2020 Jun;2(2):lqaa021. doi: 10.1093/nargab/lqaa021. Epub 2020 Apr 20.
The emergence of next-generation sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations, such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analysing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub platform: https://github.com/IARCbioinfo/needlestack.
新一代测序(NGS)技术的出现彻底改变了获取基因组序列的方式,有望全面表征DNA变异。然而,检测体细胞突变仍然是一个难题,尤其是在试图识别低丰度突变时,例如亚克隆突变、体液中的肿瘤衍生改变或组织学正常组织中的体细胞突变。主要挑战在于精确区分测序假象和真正的突变,特别是当真正的突变非常罕见以至于其丰度水平与假象相似时。在此,我们介绍Needlestack,一种高度灵敏的变异检测工具,它直接从数据中学习系统测序错误水平以准确检测突变。Needlestack基于这样一种理念,即可以通过一起分析多个样本动态估计测序错误率。我们表明测序错误率在不同变异中有所不同,这表明需要精确估计它。我们评估了Needlestack对各种类型变异的性能,并且表明Needlestack在不同位置上表现稳健,对于低丰度突变优于现有的最先进方法。Needlestack及其源代码可在GitHub平台上免费获取:https://github.com/IARCbioinfo/needlestack。