Sinha Rituparna, Samaddar Sandip, De Rajat K
Department of Information Technology, Heritage Institute Of Technology, Kolkata, West Bengal, India.
Department of Computer Science and Engineering, Heritage Institute Of Technology, Kolkata, West Bengal, India.
PLoS One. 2015 Aug 20;10(8):e0135895. doi: 10.1371/journal.pone.0135895. eCollection 2015.
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.
拷贝数变异(CNV)是哺乳动物DNA序列结构改变的一种形式,它与许多复杂的神经疾病以及癌症相关。新一代测序(NGS)技术的发展为我们检测存在拷贝数变异的基因组位置提供了一个新的维度。在此,我们开发了一种基于NGS技术生成的覆盖深度数据来检测CNV的算法。在这项工作中,我们采用了一种新颖的方式将读取计数数据表示为二维几何点。检测存在CNV的区域的一个关键方面是设计一种合适的分割算法,该算法将区分读取计数数据存在显著差异的基因组位置。在此背景下,我们使用凸包算法对读取计数数据的几何表示设计了一种新的分割方法。据我们所知,大多数算法使用读取计数数据的单一分布模型,但在我们的方法中,我们考虑读取计数数据独立遵循两种不同的分布模型,这增加了CNV检测的稳健性。此外,我们的算法基于多样本分析方法调用CNV,从而实现低错误发现率和高精度。