Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia.
The Tumour Bank, The Children's Hospital at Westmead, Sydney, Australia.
BMC Bioinformatics. 2021 Dec 11;22(1):588. doi: 10.1186/s12859-021-04494-w.
Copy number variants (CNVs) are the gain or loss of DNA segments in the genome. Studies have shown that CNVs are linked to various disorders, including autism, intellectual disability, and schizophrenia. Consequently, the interest in studying a possible association of CNVs to specific disease traits is growing. However, due to the specific multi-dimensional characteristics of the CNVs, methods for testing the association between CNVs and the disease-related traits are still underdeveloped. We propose a novel multi-dimensional CNV kernel association test (MCKAT) in this paper. We aim to find significant associations between CNVs and disease-related traits using kernel-based methods.
We address the multi-dimensionality in CNV characteristics. We first design a single pair CNV kernel, which contains three sub-kernels to summarize the similarity between two CNVs considering all CNV characteristics. Then, aggregate single pair CNV kernel to the whole chromosome CNV kernel, which summarizes the similarity between CNVs in two or more chromosomes. Finally, the association between the CNVs and disease-related traits is evaluated by comparing the similarity in the trait with kernel-based similarity using a score test in a random effect model. We apply MCKAT on genome-wide CNV datasets to examine the association between CNVs and disease-related traits, which demonstrates the potential usefulness the proposed method has for the CNV association tests. We compare the performance of MCKAT with CKAT, a uni-dimensional kernel method. Based on the results, MCKAT indicates stronger evidence, smaller p-value, in detecting significant associations between CNVs and disease-related traits in both rare and common CNV datasets.
A multi-dimensional copy number variant kernel association test can detect statistically significant associated CNV regions with any disease-related trait. MCKAT can provide biologists with CNV hot spots at the cytogenetic band level that CNVs on them may have a significant association with disease-related traits. Using MCKAT, biologists can narrow their investigation from the whole genome, including many genes and CNVs, to more specific cytogenetic bands that MCKAT identifies. Furthermore, MCKAT can help biologists detect significantly associated CNVs with disease-related traits across a patient group instead of examining each subject's CNVs case by case.
拷贝数变异(CNVs)是基因组中 DNA 片段的获得或缺失。研究表明,CNVs 与各种疾病有关,包括自闭症、智力障碍和精神分裂症。因此,人们对研究 CNVs 与特定疾病特征之间可能存在的关联越来越感兴趣。然而,由于 CNVs 的特定多维特征,用于测试 CNVs 与疾病相关特征之间关联的方法仍在开发中。我们在本文中提出了一种新的多维 CNV 核关联测试(MCKAT)。我们旨在使用基于核的方法来发现 CNVs 与疾病相关特征之间的显著关联。
我们解决了 CNV 特征的多维性问题。我们首先设计了一个单对 CNV 核,它包含三个子核,用于考虑所有 CNV 特征来总结两个 CNV 之间的相似性。然后,将单对 CNV 核聚合到整个染色体 CNV 核中,用于总结两个或更多染色体中 CNV 之间的相似性。最后,通过在随机效应模型中使用基于得分的检验,将特征中的相似性与基于核的相似性进行比较,来评估 CNV 与疾病相关特征之间的关联。我们将 MCKAT 应用于全基因组 CNV 数据集,以检查 CNV 与疾病相关特征之间的关联,这表明该方法在 CNV 关联测试中具有潜在的有用性。我们将 MCKAT 与 CKAT(一种一维核方法)的性能进行了比较。基于结果,在稀有和常见 CNV 数据集上,MCKAT 表明在检测 CNV 与疾病相关特征之间的显著关联方面具有更强的证据,p 值更小。
多维拷贝数变异核关联测试可以检测到与任何疾病相关特征具有统计学显著关联的 CNV 区域。MCKAT 可以为生物学家提供在细胞遗传带水平上的 CNV 热点,表明它们上的 CNVs 可能与疾病相关特征有显著关联。使用 MCKAT,生物学家可以将他们的研究从整个基因组缩小到 MCKAT 确定的更具体的细胞遗传带,其中包括许多基因和 CNVs。此外,MCKAT 可以帮助生物学家检测与疾病相关特征相关的显著关联的 CNV 跨越患者群体,而不是逐个检查每个患者的 CNV。