Xing Chao, Schumacher Fredrick R, Conti David V, Witte John S
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA.
BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S44. doi: 10.1186/1471-2156-4-S1-S44.
Observational cohort studies have been little used in linkage analyses due to their general lack of large, disease-specific pedigrees. Nevertheless, the longitudinal nature of such studies makes them potentially valuable for assessing the linkage between genotypes and temporal trends in phenotypes. The repeated phenotype measures in cohort studies (i.e., across time), however, can have extensive missing information. Existing methods for handling missing data in observational studies may decrease efficiency, introduce biases, and give spurious results. The impact of such methods when undertaking linkage analysis of cohort studies is unclear. Therefore, we compare here six methods of imputing missing repeated phenotypes on results from genome-wide linkage analyses of four quantitative traits from the Framingham Heart Study cohort.
We found that simply deleting observations with missing values gave many more nominally statistically significant linkages than the other five approaches. Among the latter, those with similar underlying methodology (i.e., imputation- versus model-based) gave the most consistent results, although some discrepancies remained.
Different methods for addressing missing values in linkage analyses of cohort studies can give substantially diverse results, and must be carefully considered to protect against biases and spurious findings.
观察性队列研究由于普遍缺乏大型的、特定疾病的家系,在连锁分析中很少被使用。然而,此类研究的纵向性质使其在评估基因型与表型的时间趋势之间的联系方面具有潜在价值。然而,队列研究中重复的表型测量(即随时间推移)可能存在大量缺失信息。现有处理观察性研究中缺失数据的方法可能会降低效率、引入偏差并产生虚假结果。在对队列研究进行连锁分析时,这些方法的影响尚不清楚。因此,我们在此比较六种填补重复表型缺失值的方法对弗雷明汉心脏研究队列中四个定量性状进行全基因组连锁分析的结果。
我们发现,简单地删除有缺失值的观察结果比其他五种方法产生了更多名义上具有统计学意义的连锁。在后者中,具有相似基础方法(即基于插补与基于模型)的方法给出了最一致的结果,尽管仍存在一些差异。
在队列研究的连锁分析中,处理缺失值的不同方法可能会产生截然不同的结果,必须仔细考虑以防止偏差和虚假发现。