Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.
Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan.
PLoS Comput Biol. 2020 May 4;16(5):e1007797. doi: 10.1371/journal.pcbi.1007797. eCollection 2020 May.
Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals' copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of "copy number profile curves" to describe the CNV profile of an individual, and the "common area under the curve (cAUC) kernel" to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.
拷贝数变异 (CNV) 是基因组中 DNA 片段的获得或缺失,其剂量和长度可发生变化。CNV 构成了人类基因组变异的很大一部分,并影响健康状况。为了检测罕见的 CNV 关联,基于核的方法由于其在对聚集的 CNV 效应进行建模的灵活性、从不同的 CNV 特征捕获效应的能力以及适应效应异质性的能力,已被证明是一种强大的工具。为了进行核关联测试,需要定义 CNV 基因座,以便在聚集过程中保留基因座特异性效应。然而,CNV 基因座是任意定义的,不同的基因座定义可能会因潜在的效应模式而异而导致不同的性能。在这项工作中,我们开发了一种称为 CONCUR(即基于拷贝数谱曲线的关联测试)的新的基于核的测试方法,该方法不受基因座定义的限制,通过比较个体在基因组区域上的拷贝数谱来评估 CNV-表型关联。CONCUR 建立在提出的“拷贝数谱曲线”概念的基础上,用于描述个体的 CNV 谱,以及“共同曲线下面积 (cAUC) 核”,用于模拟多特征 CNV 效应。该方法捕获了 CNV 剂量和长度的效应,考虑了拷贝数的数值性质,并适应了基因座内和基因座间的病因异质性,而无需像当前的核方法那样需要定义人为的 CNV 基因座。在各种模拟设置中,CONCUR 显示出与现有方法相当或改进的功效。真实数据分析表明,CONCUR 能够很好地检测瑞典精神分裂症研究和台湾生物库中的 CNV 效应。