Iakovishina Daria, Janoueix-Lerosey Isabelle, Barillot Emmanuel, Regnier Mireille, Boeva Valentina
INRIA Projet AMIB, Ecole Polytechnique, Palaiseau, France.
Institut Curie, Centre De Recherche, Paris Inserm, U830, Department Genetics and Biology of Cancers, Paris, France.
Bioinformatics. 2016 Apr 1;32(7):984-92. doi: 10.1093/bioinformatics/btv751. Epub 2016 Jan 6.
Whole genome sequencing of paired-end reads can be applied to characterize the landscape of large somatic rearrangements of cancer genomes. Several methods for detecting structural variants with whole genome sequencing data have been developed. So far, none of these methods has combined information about abnormally mapped read pairs connecting rearranged regions and associated global copy number changes automatically inferred from the same sequencing data file. Our aim was to create a computational method that could use both types of information, i.e. normal and abnormal reads, and demonstrate that by doing so we can highly improve both sensitivity and specificity rates of structural variant prediction.
We developed a computational method, SV-Bay, to detect structural variants from whole genome sequencing mate-pair or paired-end data using a probabilistic Bayesian approach. This approach takes into account depth of coverage by normal reads and abnormalities in read pair mappings. To estimate the model likelihood, SV-Bay considers GC-content and read mappability of the genome, thus making important corrections to the expected read count. For the detection of somatic variants, SV-Bay makes use of a matched normal sample when it is available. We validated SV-Bay on simulated datasets and an experimental mate-pair dataset for the CLB-GA neuroblastoma cell line. The comparison of SV-Bay with several other methods for structural variant detection demonstrated that SV-Bay has better prediction accuracy both in terms of sensitivity and false-positive detection rate.
https://github.com/InstitutCurie/SV-Bay
Supplementary data are available at Bioinformatics online.
双端测序读段的全基因组测序可用于描绘癌症基因组中大型体细胞重排的格局。已经开发了几种利用全基因组测序数据检测结构变异的方法。到目前为止,这些方法中没有一种能将连接重排区域的异常映射读对信息与从同一测序数据文件中自动推断出的相关全局拷贝数变化信息结合起来。我们的目标是创建一种计算方法,该方法可以同时使用这两种信息,即正常和异常读段,并证明通过这样做可以大幅提高结构变异预测的灵敏度和特异性。
我们开发了一种计算方法SV-Bay,使用概率贝叶斯方法从全基因组测序的配对末端或双端数据中检测结构变异。这种方法考虑了正常读段的覆盖深度和读对映射中的异常情况。为了估计模型似然性,SV-Bay考虑了基因组的GC含量和读段可映射性,从而对预期读段计数进行重要校正。对于体细胞变异的检测,SV-Bay在有匹配的正常样本时会加以利用。我们在模拟数据集和CLB-GA神经母细胞瘤细胞系的实验配对末端数据集上对SV-Bay进行了验证。将SV-Bay与其他几种结构变异检测方法进行比较表明,SV-Bay在灵敏度和假阳性检测率方面都具有更好的预测准确性。
https://github.com/InstitutCurie/SV-Bay
补充数据可在《生物信息学》在线获取。