Yu Xuanxuan, Luo Xizhi, Cai Guoshuai, Xiao Feifei
Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA.
Data and Statistical Sciences, AbbVie Inc., North Chicago, Illinois, USA.
Genet Epidemiol. 2024 Mar 27. doi: 10.1002/gepi.22558.
Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV-disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV-disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.
拷贝数变异(CNV)在人类基因组中普遍存在,并且已发现其对基因组组织和人类疾病具有深远影响。发现与疾病相关的CNV对于理解疾病的发病机制以及辅助疾病的诊断和治疗至关重要。然而,传统的评估CNV与疾病风险之间关联的方法采用两阶段策略,即首先进行CNV定量测量,然后进行关联测试,这可能导致关联估计有偏差且统计效力较低,成为此类变异在常规全基因组评估中的主要障碍。在本文中,我们开发了单阶段CNV-疾病关联分析(OSCAA),这是一种灵活的算法,用于发现与定量和定性性状相关的疾病CNV。OSCAA采用基于拷贝数强度的主成分构建的二维高斯混合模型,在考虑CNV检测中的技术偏差的同时,对其对结果性状的影响进行同步测试。在OSCAA中,在单个步骤中同时识别CNV并评估它们与疾病风险的关联,在统计模型中考虑CNV识别的不确定性。我们的模拟表明,OSCAA在CNV-疾病关联估计方面比现有的单阶段方法和传统的两阶段方法表现更好,特别是对于短CNV或信号较弱的CNV。总之,OSCAA是一种用于CNV关联测试的强大且灵活的方法,具有高灵敏度和特异性,可轻松应用于不同性状和临床风险预测。