Suppr超能文献

了解下一代测序信息学的局限性,一种使用人工数据集进行临床流程验证的方法。

Understanding the limitations of next generation sequencing informatics, an approach to clinical pipeline validation using artificial data sets.

作者信息

Daber Robert, Sukhadia Shrey, Morrissette Jennifer J D

机构信息

Center for Personalized Diagnostics, University of Pennsylvania School of Medicine, Philadelphia, PA.

Center for Personalized Diagnostics, University of Pennsylvania School of Medicine, Philadelphia, PA.

出版信息

Cancer Genet. 2013 Dec;206(12):441-8. doi: 10.1016/j.cancergen.2013.11.005. Epub 2013 Nov 28.

Abstract

The advantages of massively parallel sequencing are quickly being realized through the adoption of comprehensive genomic panels across the spectrum of genetic testing. Despite such widespread utilization of next generation sequencing (NGS), a major bottleneck in the implementation and capitalization of this technology remains in the data processing steps, or bioinformatics. Here we describe our approach to defining the limitations of each step in the data processing pipeline by utilizing artificial amplicon data sets to simulate a wide spectrum of genomic alterations. Through this process, we identified limitations of insertion, deletion (indel), and single nucleotide variant (SNV) detection using standard approaches and described novel strategies to improve overall somatic mutation detection. Using these artificial data sets, we were able to demonstrate that NGS assays can have robust mutation detection if the data can be processed in a way that does not lead to large genomic alterations landing in the unmapped data (i.e., trash). By using these pipeline modifications and a new variant caller, AbsoluteVar, we have been able to validate SNV mutation detection to 100% sensitivity and specificity with an allele frequency as low 4% and detection of indels as large as 90 bp. Clinical validation of NGS relies on the ability for mutation detection across a wide array of genetic anomalies, and the utility of artificial data sets demonstrates a mechanism to intelligently test a vast array of mutation types.

摘要

通过在各种基因检测中采用综合基因组检测板,大规模平行测序的优势正在迅速显现。尽管下一代测序(NGS)得到了如此广泛的应用,但该技术在实施和利用方面的一个主要瓶颈仍然存在于数据处理步骤,即生物信息学。在这里,我们描述了我们的方法,即通过利用人工扩增子数据集来模拟广泛的基因组改变,从而确定数据处理流程中每个步骤的局限性。通过这个过程,我们确定了使用标准方法检测插入、缺失(indel)和单核苷酸变异(SNV)的局限性,并描述了改进总体体细胞突变检测的新策略。使用这些人工数据集,我们能够证明,如果数据能够以一种不会导致大量基因组改变落入未映射数据(即垃圾数据)的方式进行处理,那么NGS检测可以具有强大的突变检测能力。通过使用这些流程修改和一个新的变异调用器AbsoluteVar,我们已经能够将SNV突变检测的灵敏度和特异性验证到100%,等位基因频率低至4%,并能检测长达90 bp的indel。NGS的临床验证依赖于对各种基因异常进行突变检测的能力,而人工数据集的实用性展示了一种智能测试大量突变类型的机制。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验