Zhang Zhen, Wang Jianxin, Luo Junwei, Ding Xiaojun, Zhong Jiancheng, Wang Jun, Wu Fang-Xiang, Pan Yi
School of Information Science and Engineering, Central South University, Changsha, 410083, China, College of Information and Communication Engineering, Hunan Institute of Science and Technology, Yueyang, 414006, China.
School of Information Science and Engineering, Central South University, Changsha, 410083, China.
Bioinformatics. 2016 Jun 15;32(12):1788-96. doi: 10.1093/bioinformatics/btw053. Epub 2016 Feb 1.
Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging. It is highly appealing to develop sensitive and accurate methods to detect deletions from sequencing data, especially deletions with microhomology and deletions with microinsertion.
We present a novel method called Sprites (SPlit Read re-alIgnment To dEtect Structural variants) which finds deletions from sequencing data. It aligns a whole soft-clipping read rather than its clipped part to the target sequence, a segment of the reference which is determined by spanning reads, in order to find the longest prefix or suffix of the read that has a match in the target sequence. This alignment aims to solve the problem of deletions with microhomologies and deletions with microinsertions. Using both simulated and real data we show that Sprites performs better on detecting deletions compared with other current methods in terms of F-score.
Sprites is open source software and freely available at https://github.com/zhangzhen/sprites
jxwang@mail.csu.edu.cnSupplementary data: Supplementary data are available at Bioinformatics online.
新一代测序技术的进步以及短读长数据的可得性使得结构变异(SVs)的检测成为可能。缺失作为SVs的一种重要类型,已被认为与遗传疾病有关。有三种类型的缺失:平端缺失、具有微同源性的缺失和具有微插入的缺失。后两种类型在人类基因组中非常常见,但它们的检测存在困难。此外,从测序数据中发现缺失仍然具有挑战性。开发灵敏且准确的方法从测序数据中检测缺失,尤其是具有微同源性的缺失和具有微插入的缺失,极具吸引力。
我们提出了一种名为Sprites(通过分割读段重新比对来检测结构变异)的新方法,该方法可从测序数据中发现缺失。它将整个软剪切读段而非其剪切部分与目标序列(由跨越读段确定的参考序列片段)进行比对,以找到读段中在目标序列中有匹配的最长前缀或后缀。这种比对旨在解决具有微同源性的缺失和具有微插入的缺失问题。使用模拟数据和真实数据,我们表明在F值方面,Sprites在检测缺失方面比其他现有方法表现更好。
Sprites是开源软件,可在https://github.com/zhangzhen/sprites上免费获取。
jxwang@mail.csu.edu.cn补充数据:补充数据可在《生物信息学》在线获取。