Nodehi Hannane Mohammadi, Tabatabaiefar Mohammad Amin, Sehhati Mohammadreza
Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
Department of Medical Genetics, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
J Med Signals Sens. 2021 Jan 30;11(1):37-44. doi: 10.4103/jmss.JMSS_7_20. eCollection 2021 Jan-Mar.
Careful design in the primary steps of a next-generation sequencing study is critical for obtaining successful results in downstream analysis.
In this study, a framework is proposed to evaluate and improve the sequence mapping in targeted regions of the reference genome. In this regard, simulated short reads were produced from the coding regions of the human genome and mapped to a Customized Target-Based Reference (CTBR) by the alignment tools that have been introduced recently. The short reads produced by different sequencing technologies aligned to the standard genome and also CTBR with and without well-defined mutation types where the amount of unmapped and misaligned reads and runtime was measured for comparison.
The results showed that the mapping accuracy of the reads generated from Illumina Hiseq2500 using Stampy as the alignment tool whenever the CTBR was used as reference was significantly better than other evaluated pipelines. Using CTBR for alignment significantly decreased the mapping error in comparison to other expanded or more limited references. While intentional mutations were imported in the reads, Stampy showed the minimum error of 1.67% using CTBR. However, the lowest error obtained by stampy too using whole genome and one chromosome as references was 3.78% and 20%, respectively. Maximum and minimum misalignment errors were observed on chromosome Y and 20, respectively.
Therefore using the proposed framework in a clinical targeted sequencing study may lead to predict the error and improve the performance of variant calling regarding the genomic regions targeted in a clinical study.
在新一代测序研究的初始步骤中进行精心设计对于在下游分析中获得成功结果至关重要。
在本研究中,提出了一个框架来评估和改进参考基因组靶向区域中的序列比对。在这方面,从人类基因组的编码区域产生模拟短读段,并通过最近引入的比对工具将其比对到定制的基于靶标的参考序列(CTBR)上。由不同测序技术产生的短读段与标准基因组以及有无明确突变类型的CTBR进行比对,测量未比对和比对错误的读段数量以及运行时间以作比较。
结果表明,无论何时将CTBR用作参考序列,使用Stampy作为比对工具从Illumina Hiseq2500产生的读段的比对准确性均明显优于其他评估流程。与其他扩展或更有限的参考序列相比,使用CTBR进行比对显著降低了比对错误。当在读段中引入有意突变时,使用CTBR时Stampy显示的最低错误率为1.67%。然而,Stampy使用全基因组和一条染色体作为参考序列时获得的最低错误率分别为3.78%和20%。分别在Y染色体和20号染色体上观察到最大和最小的比对错误。
因此,在临床靶向测序研究中使用所提出的框架可能有助于预测错误并提高临床研究中靶向基因组区域的变异检测性能。