Wang WeiBo, Wang Wei, Sun Wei, Crowley James J, Szatkiewicz Jin P
Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599-3175, USA.
Department of Computer Science, University of California, Los Angeles, CA 90095, USA.
Nucleic Acids Res. 2015 Aug 18;43(14):e90. doi: 10.1093/nar/gkv319. Epub 2015 Apr 16.
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.
拷贝数变异(CNV)是遗传变异的主要形式,也是多种人类疾病的风险因素,因此准确检测和表征它们至关重要。可以想象,高通量测序数据中的等位基因特异性读数可用于增强CNV检测并生成等位基因特异性拷贝数(ASCN)调用。尽管已经开发了统计方法来使用全基因组序列(WGS)和/或全外显子组序列(WES)数据检测CNV,但等位基因特异性读数计数的信息尚未得到充分利用。在本文中,我们开发了一种名为AS-GENSENG的综合方法,该方法在CNV检测中纳入等位基因特异性读数计数,并使用WGS或WES数据估计ASCN。为了评估AS-GENSENG的性能,我们进行了广泛的模拟,使用现有的WGS和WES数据集生成经验数据,并使用独立方法验证预测的CNV。我们得出结论,AS-GENSENG不仅能预测准确的ASCN调用,还能提高总拷贝数调用的准确性,这得益于其独特的能力,即在考虑序列数据中的各种实验偏差的同时,利用总读数计数和等位基因特异性读数计数的信息。我们新颖、用户友好且计算高效的方法以及完整的分析协议可在https://sourceforge.net/projects/asgenseng/上免费获取。