Maus Esfahani Nastaran, Catchpoole Daniel, Kennedy Paul J
Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney 2007, Australia.
The Tumour Bank, The Children's Hospital at Westmead, Sydney 2145, Australia.
Life (Basel). 2021 Nov 26;11(12):1302. doi: 10.3390/life11121302.
Copy number variants (CNVs) are the most common form of structural genetic variation, reflecting the gain or loss of DNA segments compared with a reference genome. Studies have identified CNV association with different diseases. However, the association between the sequential order of CNVs and disease-related traits has not been studied, to our knowledge, and it is still unclear that CNVs function individually or whether they work in coordination with other CNVs to manifest a disease or trait. Consequently, we propose the first such method to test the association between the sequential order of CNVs and diseases. Our sequential multi-dimensional CNV kernel-based association test (SMCKAT) consists of three parts: (1) a single CNV group kernel measuring the similarity between two groups of CNVs; (2) a whole genome group kernel that aggregates several single group kernels to summarize the similarity between CNV groups in a single chromosome or the whole genome; and (3) an association test between the CNV sequential order and disease-related traits using a random effect model. We evaluate SMCKAT on CNV data sets exhibiting rare or common CNVs, demonstrating that it can detect specific biologically relevant chromosomal regions supported by the biomedical literature. We compare the performance of SMCKAT with MCKAT, a multi-dimensional kernel association test. Based on the results, SMCKAT can detect more specific chromosomal regions compared with MCKAT that not only have CNV characteristics, but the CNV order on them are significantly associated with the disease-related trait.
拷贝数变异(CNV)是结构遗传变异最常见的形式,反映了与参考基因组相比DNA片段的增减。研究已确定CNV与不同疾病有关联。然而,据我们所知,尚未对CNV的顺序与疾病相关性状之间的关联进行研究,并且仍不清楚CNV是单独发挥作用,还是与其他CNV协同作用以表现出某种疾病或性状。因此,我们提出了第一种此类方法来测试CNV顺序与疾病之间的关联。我们基于顺序多维CNV核的关联测试(SMCKAT)由三部分组成:(1)一个单CNV组核,用于测量两组CNV之间的相似性;(2)一个全基因组组核,它聚合几个单组核以总结单个染色体或全基因组中CNV组之间的相似性;(3)使用随机效应模型对CNV顺序与疾病相关性状之间进行关联测试。我们在显示罕见或常见CNV的CNV数据集上评估了SMCKAT,证明它可以检测到生物医学文献支持的特定生物学相关染色体区域。我们将SMCKAT的性能与多维核关联测试MCKAT进行了比较。基于结果,与MCKAT相比,SMCKAT可以检测到更特定的染色体区域,这些区域不仅具有CNV特征,而且其上的CNV顺序与疾病相关性状显著相关。