Sun Shulei, Murray Sarah S
Center for Advanced Laboratory Medicine, University of California San Diego Health, La Jolla, CA, USA.
Department of Pathology, University of California San Diego, La Jolla, CA, USA.
Methods Mol Biol. 2019;1908:37-48. doi: 10.1007/978-1-4939-9004-7_3.
The use of next-generation sequencing and hybridization-based capture for target enrichment have enabled the interrogation of coding regions of several clinically significant cancer genes in tumor specimens using both targeted panels of a few to hundreds of genes, to whole-exome panels encompassing coding regions of all genes in the genome. Next-generation sequencing (NGS) technologies produce millions of relatively short segments of sequences or reads that require bioinformatics tools to map reads back to a reference genome using various read alignment tools, as well as to determine differences between single bases (single nucleotide variants or SNVs) or multiple bases (insertions and deletions or indels) between the aligned reads and the reference genome to call variants. In addition to single nucleotide changes or small insertions and deletions, high copy gains and losses can also be gleaned from NGS data to call gene amplifications and deletions. Throughout these processes, numerous quality control metrics can be assessed at each step to ensure that the resulting called variants are of high quality and are accurate. In this chapter we review common tools used to generate reads from Illumina-derived sequence data, align reads, and call variants from hybridization-based targeted NGS panel data generated from tumor FFPE-derived DNA specimens as well as basic quality metrics to assess for each assayed specimen.
使用基于杂交捕获的新一代测序技术进行目标富集,能够利用从包含少数到数百个基因的靶向panel到涵盖基因组中所有基因编码区的全外显子组panel,对肿瘤标本中多个具有临床意义的癌症基因的编码区进行检测。新一代测序(NGS)技术会产生数百万个相对较短的序列片段或读数,这需要生物信息学工具使用各种读段比对工具将读数映射回参考基因组,以及确定比对后的读数与参考基因组之间单碱基(单核苷酸变异或SNV)或多个碱基(插入和缺失或Indel)的差异来识别变异。除了单核苷酸变化或小的插入和缺失外,还可以从NGS数据中获取高拷贝数的增加和减少,以识别基因扩增和缺失。在整个这些过程中,可以在每个步骤评估众多质量控制指标,以确保最终识别出的变异具有高质量且准确。在本章中,我们将回顾用于从Illumina衍生的序列数据生成读数、比对读数以及从肿瘤FFPE衍生DNA标本生成的基于杂交靶向NGS panel数据中识别变异的常用工具,以及针对每个检测标本评估的基本质量指标。