Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA.
Department of Mathematics, University of Arizona, Tucson, AZ, USA.
Bioinformatics. 2019 Sep 1;35(17):2891-2898. doi: 10.1093/bioinformatics/bty1041.
Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2.
Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma.
http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2.
Supplementary data are available at Bioinformatics online.
整合多个遗传来源进行拷贝数变异检测(CNV)是一种强大的方法,可以提高与复杂性状相关变异体的识别能力。虽然已经表明,广泛使用的基于断点的方法可以提高识别变异体的统计能力,但由于基因型强度数据的噪声性质,仍然难以有效地检测信号较弱的 CNV。我们之前开发了 modSaRa,这是一种基于正常均值的筛选和排名算法的模型,用于拷贝数变异识别,它具有较高的计算效率和理想的敏感性。为了提高识别变体的统计能力,我们提出了一种新的改进方法,该方法将相对等位基因强度与经验统计建模的外部信息相结合,我们称之为 modSaRa2。
模拟研究表明,modSaRa2 显著提高了现有分析基于阵列数据的方法的敏感性和特异性。对弱 CNV 信号检测的改进最为显著,同时也提高了 CNV 大小变化时的稳定性。该新方法应用于全基因组黑素瘤数据集,鉴定出与染色体带 1p22.2 相关的新型候选黑素瘤风险缺失,以及在 6p22、6q25 和 19p13 区域的重复,这可能有助于理解种系拷贝数变异在黑素瘤病因学中的可能作用。
http://c2s2.yale.edu/software/modSaRa2 或 https://github.com/FeifeiXiaoUSC/modSaRa2。
补充数据可在生物信息学在线获得。