Cheng Yichen, Dai James Y, Wang Xiaoyu, Kooperberg Charles
Institute for Insight, Georgia State University, Atlanta, Georgia, U.S.A.
Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A.
Biometrics. 2018 Dec;74(4):1341-1350. doi: 10.1111/biom.12920. Epub 2018 Jun 12.
Copy number variation (CNV) of DNA plays an important role in the development of many diseases. However, due to the irregularity and sparsity of the CNVs, studying the association between CNVs and a disease outcome or a trait can be challenging. Up to now, not many methods have been proposed in the literature for this problem. Most of the current researchers reply on an ad hoc two-stage procedure by first identifying CNVs in each individual genome and then performing an association test using these identified CNVs. This potentially leads to information loss and as a result a lower power to identify disease associated CNVs. In this article, we describe a new method that combines the two steps into a single coherent model to identify the common CNV across patients that are associated with certain diseases. We use a double penalty model to capture CNVs' association with both the intensities and the disease trait. We validate its performance in simulated datasets and a data example on platinum resistance and CNV in ovarian cancer genome.
DNA的拷贝数变异(CNV)在许多疾病的发展中起着重要作用。然而,由于CNV的不规则性和稀疏性,研究CNV与疾病结局或性状之间的关联可能具有挑战性。到目前为止,文献中针对这个问题提出的方法并不多。当前大多数研究人员依赖于一种临时的两阶段程序,即首先在每个个体基因组中识别CNV,然后使用这些识别出的CNV进行关联测试。这可能会导致信息丢失,从而降低识别与疾病相关的CNV的能力。在本文中,我们描述了一种新方法,该方法将这两个步骤合并为一个连贯的模型,以识别与某些疾病相关的患者间的常见CNV。我们使用双重惩罚模型来捕捉CNV与强度和疾病性状的关联。我们在模拟数据集以及卵巢癌基因组中铂耐药性和CNV的数据实例中验证了其性能。