Tounkara Fodé, Lefebvre Geneviève, Greenwood Celia, Oualkacha Karim
Lunenfeld-Tenenbaum Research Institute, Toronto, Canada.
Department of Mathematics, Université du Québec à Montréal, Montreal, Canada.
Stat Med. 2020 Feb 28;39(5):517-543. doi: 10.1002/sim.8416. Epub 2019 Dec 23.
Data collected for a genome-wide association study of a primary phenotype are often used for additional genome-wide association analyses of secondary phenotypes. However, when the primary and secondary traits are dependent, naïve analyses of secondary phenotypes may induce spurious associations in non-randomly ascertained samples. Previously, retrospective likelihood-based methods have been proposed to correct for sampling biases arising in secondary trait association analyses. However, most methods have been introduced to handle studies featuring a case-control design based on a binary primary phenotype. As such, these methods are not directly applicable to more complicated study designs such as multiple-trait studies, where the sampling mechanism also depends on the secondary phenotype, or extreme-trait studies, where individuals with extreme primary phenotype values are selected. To accommodate these more complicated sampling mechanisms, only a few prospective likelihood approaches have been proposed. These approaches assume a normal distribution for the secondary phenotype (or the latent secondary phenotype) and a bivariate normal distribution for the primary-secondary phenotype dependence. In this paper, we propose a unified copula-based approach to appropriately detect genetic variant/secondary phenotype association in the presence of selected samples. Primary phenotype is either binary or continuous and the secondary phenotype is continuous although not necessary normal. We use both prospective and retrospective likelihoods to account for the sampling mechanism and use a copula model to allow for potentially different dependence structures between the primary and secondary phenotypes. We demonstrate the effectiveness of our approach through simulation studies and by analyzing data from the Avon Longitudinal Study of Parents and Children cohort.
为主要表型的全基因组关联研究收集的数据通常用于对次要表型进行额外的全基因组关联分析。然而,当主要和次要性状相关时,对次要表型进行简单分析可能会在非随机确定的样本中诱导出虚假关联。此前,有人提出基于回顾性似然的方法来校正次要性状关联分析中出现的抽样偏差。然而,大多数方法都是为处理基于二元主要表型的病例对照设计的研究而引入的。因此,这些方法不能直接应用于更复杂的研究设计,如多性状研究(其中抽样机制也取决于次要表型)或极端性状研究(其中选择具有极端主要表型值的个体)。为了适应这些更复杂的抽样机制,仅提出了少数前瞻性似然方法。这些方法假设次要表型(或潜在次要表型)呈正态分布,主要-次要表型相关性呈二元正态分布。在本文中,我们提出了一种基于统一copula的方法,以在存在选择样本的情况下适当地检测基因变异/次要表型关联。主要表型可以是二元的或连续的,次要表型是连续的,尽管不一定是正态的。我们使用前瞻性和回顾性似然来考虑抽样机制,并使用copula模型来允许主要和次要表型之间可能不同的依赖结构。我们通过模拟研究和分析来自雅芳亲子纵向研究队列的数据来证明我们方法的有效性。