Duan Junbo, Zhang Ji-Gang, Deng Hong-Wen, Wang Yu-Ping
Department of Biomedical Engineering, Tulane University, New Orleans, USA.
Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:1246-9. doi: 10.1109/EMBC.2012.6346163.
Copy number variation (CNV) is a structural variation in human genome that has been associated with many complex diseases. In this paper we present a method to detect common copy number variation from next generation sequencing data. First, copy number variations are detected from each individual sample, which is formulated as a total variation penalized least square problem. Second, the common copy number discovery from multiple samples is obtained using source separation techniques such as the non-negative matrix factorization (NMF). Finally, the method is applied to population clustering. The results on real data analysis show that two family trio with different ancestries can be clustered into two ethnic groups based on their common CNVs, demonstrating the potential of the proposed method for application to population genetics.
拷贝数变异(CNV)是人类基因组中的一种结构变异,与许多复杂疾病相关。在本文中,我们提出了一种从下一代测序数据中检测常见拷贝数变异的方法。首先,从每个个体样本中检测拷贝数变异,这被公式化为一个总变异惩罚最小二乘问题。其次,使用诸如非负矩阵分解(NMF)等源分离技术从多个样本中发现常见的拷贝数。最后,将该方法应用于群体聚类。实际数据分析结果表明,两个具有不同祖先的三联体家族可以根据其常见的CNV聚类为两个种族群体,证明了所提出方法在群体遗传学应用中的潜力。