School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
Bioinformatics. 2010 Jun 15;26(12):i350-7. doi: 10.1093/bioinformatics/btq216.
Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present 'conflict resolution' improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507).
The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm.
Supplementary data are available at Bioinformatics online.
近年来,人们对检测结构变异(SV)及其与人类疾病关联的研究活动有所增加。下一代测序技术的出现使得结构变异研究的范围扩展到了以前难以想象的程度,正如 1000 基因组计划所证明的那样。虽然已经描述了各种用于检测 SV 的计算方法,但还没有一种算法能够完全发现转座子插入,这是人类进化和疾病研究中非常重要的一类 SV。在本文中,我们提供了一种完整而新颖的方法来发现插入高通量测序技术测序基因组中的基因座和转座子类。此外,我们还通过考虑人类基因组的二倍体性质,对我们早期的组合 SV 检测算法(VariationHunter)进行了“冲突解决”改进。我们使用 Venter 基因组(HuRef)的模拟数据测试我们的算法,能够以 >90%的精度发现 >85%的转座子插入事件。我们还证明,当在非洲裔个体(NA18507)的基因组上测试时,我们的冲突解决算法(表示为 VariationHunter-CR)优于当前的最先进算法(如原始 VariationHunter、BreakDancer 和 MoDIL)。
算法的实现可在 http://compbio.cs.sfu.ca/strvar.htm 获得。
补充数据可在 Bioinformatics 在线获得。