Mohiyuddin Marghoob, Mu John C, Li Jian, Bani Asadi Narges, Gerstein Mark B, Abyzov Alexej, Wong Wing H, Lam Hugo Y K
Bina Technologies, Roche Sequencing, Redwood City, CA 94065, USA.
Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
Bioinformatics. 2015 Aug 15;31(16):2741-4. doi: 10.1093/bioinformatics/btv204. Epub 2015 Apr 10.
Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes.
Code in Python is at http://bioinform.github.io/metasv/.
Supplementary data are available at Bioinformatics online.
结构变异(SVs)是大规模的基因组重排,其大小差异显著,这使得利用来自下一代测序(NGS)的相对较短的读段来检测它们具有挑战性。已经开发了不同的SV检测方法;然而,每种方法都局限于特定类型的SVs,其准确性和分辨率各不相同。先前的工作试图将不同的方法结合起来,但它们仍然存在准确性差的问题,特别是对于插入变异。我们提出了MetaSV,一种集成的SV检测工具,它利用多个正交的SV信号来实现高精度和高分辨率。MetaSV通过合并来自多个工具的所有类型SVs的检测结果来进行。它还分析比对中的软剪切读段以准确检测插入变异,因为现有工具往往低估插入SVs。结合局部组装和动态规划来提高断点分辨率。利用双末端和覆盖信息来预测SV基因型。通过模拟和实验数据,我们证明了MetaSV在各种SV类型和大小上的有效性。
用Python编写的代码可在http://bioinform.github.io/metasv/获取。
补充数据可在《生物信息学》在线获取。