Canouil Mickaël, Balkau Beverley, Roussel Ronan, Froguel Philippe, Rocheleau Ghislain
Université de Lille, UMR 8199-EGID, Lille, France.
Centre National de la Recherche Scientifique, UMR 8199, Lille, France.
Front Genet. 2018 Jun 14;9:210. doi: 10.3389/fgene.2018.00210. eCollection 2018.
In observational cohorts, longitudinal data are collected with repeated measurements at predetermined time points for many biomarkers, along with other variables measured at baseline. In these cohorts, time until a certain event of interest occurs is reported and very often, a relationship will be observed between some biomarker repeatedly measured over time and that event. Joint models were designed to efficiently estimate statistical parameters describing this relationship by combining a mixed model for the longitudinal biomarker trajectory and a survival model for the time until occurrence of the event, using a set of random effects to account for the relationship between the two types of data. In this paper, we discuss the implementation of joint models in genetic association studies. First, we check model consistency based on different simulation scenarios, by varying sample sizes, minor allele frequencies and number of repeated measurements. Second, using genotypes assayed with the Metabochip DNA arrays (Illumina) from about 4,500 individuals recruited in the French cohort D.E.S.I.R. (), we assess the feasibility of implementing the joint modelling approach in a real high-throughput genomic dataset. An alternative model approximating the joint model, called the Two-Step approach (TS), is also presented. Although the joint model shows more precise and less biased estimators than its alternative counterpart, the TS approach results in much reduced computational times, and could thus be used for testing millions of SNPs at the genome-wide scale.
在观察性队列研究中,会在预定时间点对多种生物标志物进行重复测量来收集纵向数据,同时还会测量基线时的其他变量。在这些队列研究中,会报告直至发生某个感兴趣事件的时间,而且通常会观察到随着时间重复测量的某些生物标志物与该事件之间存在某种关系。联合模型旨在通过结合纵向生物标志物轨迹的混合模型和直至事件发生时间的生存模型,利用一组随机效应来考虑这两类数据之间的关系,从而有效地估计描述这种关系的统计参数。在本文中,我们讨论联合模型在基因关联研究中的实施情况。首先,我们通过改变样本量、次要等位基因频率和重复测量次数,基于不同的模拟场景检查模型的一致性。其次,我们使用来自法国D.E.S.I.R.队列中招募的约4500名个体的Metabochip DNA阵列(Illumina)检测的基因型,评估在实际高通量基因组数据集中实施联合建模方法的可行性。还提出了一种近似联合模型的替代模型,称为两步法(TS)。尽管联合模型比其替代模型显示出更精确且偏差更小的估计量,但TS方法可大大减少计算时间,因此可用于全基因组规模的数百万个单核苷酸多态性(SNP)的检测。