连锁分析中缺失数据处理方法的比较。

Comparison of missing data approaches in linkage analysis.

作者信息

Xing Chao, Schumacher Fredrick R, Conti David V, Witte John S

机构信息

Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA.

出版信息

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S44. doi: 10.1186/1471-2156-4-S1-S44.

DOI:10.1186/1471-2156-4-S1-S44

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1866480/

Abstract

BACKGROUND

Observational cohort studies have been little used in linkage analyses due to their general lack of large, disease-specific pedigrees. Nevertheless, the longitudinal nature of such studies makes them potentially valuable for assessing the linkage between genotypes and temporal trends in phenotypes. The repeated phenotype measures in cohort studies (i.e., across time), however, can have extensive missing information. Existing methods for handling missing data in observational studies may decrease efficiency, introduce biases, and give spurious results. The impact of such methods when undertaking linkage analysis of cohort studies is unclear. Therefore, we compare here six methods of imputing missing repeated phenotypes on results from genome-wide linkage analyses of four quantitative traits from the Framingham Heart Study cohort.

RESULTS

We found that simply deleting observations with missing values gave many more nominally statistically significant linkages than the other five approaches. Among the latter, those with similar underlying methodology (i.e., imputation- versus model-based) gave the most consistent results, although some discrepancies remained.

CONCLUSION

Different methods for addressing missing values in linkage analyses of cohort studies can give substantially diverse results, and must be carefully considered to protect against biases and spurious findings.

摘要

背景

观察性队列研究由于普遍缺乏大型的、特定疾病的家系，在连锁分析中很少被使用。然而，此类研究的纵向性质使其在评估基因型与表型的时间趋势之间的联系方面具有潜在价值。然而，队列研究中重复的表型测量（即随时间推移）可能存在大量缺失信息。现有处理观察性研究中缺失数据的方法可能会降低效率、引入偏差并产生虚假结果。在对队列研究进行连锁分析时，这些方法的影响尚不清楚。因此，我们在此比较六种填补重复表型缺失值的方法对弗雷明汉心脏研究队列中四个定量性状进行全基因组连锁分析的结果。

结果

我们发现，简单地删除有缺失值的观察结果比其他五种方法产生了更多名义上具有统计学意义的连锁。在后者中，具有相似基础方法（即基于插补与基于模型）的方法给出了最一致的结果，尽管仍存在一些差异。

结论

在队列研究的连锁分析中，处理缺失值的不同方法可能会产生截然不同的结果，必须仔细考虑以防止偏差和虚假发现。

相似文献

1

Comparison of missing data approaches in linkage analysis.连锁分析中缺失数据处理方法的比较。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S44. doi: 10.1186/1471-2156-4-S1-S44.

2

Genome-wide linkage analysis of systolic blood pressure slope using the Genetic Analysis Workshop 13 data sets.利用遗传分析研讨会13数据集对收缩压斜率进行全基因组连锁分析。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S86. doi: 10.1186/1471-2156-4-S1-S86.

3

Genome scan linkage results for longitudinal systolic blood pressure phenotypes in subjects from the Framingham Heart Study.弗雷明汉心脏研究中受试者纵向收缩压表型的全基因组扫描连锁分析结果。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S83. doi: 10.1186/1471-2156-4-S1-S83.

4

Genetic linkage analysis of longitudinal hypertension phenotypes using three summary measures.使用三种汇总指标对纵向高血压表型进行遗传连锁分析。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S24. doi: 10.1186/1471-2156-4-S1-S24.

5

A genome-wide linkage scan for body mass index on Framingham Heart Study families.基于弗雷明汉心脏研究家族进行的全基因组连锁扫描以研究体重指数。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S97. doi: 10.1186/1471-2156-4-S1-S97.

6

Genome-wide linkage analysis of blood pressure under locus heterogeneity.在位点异质性情况下血压的全基因组连锁分析。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S78. doi: 10.1186/1471-2156-4-S1-S78.

7

Segregation and linkage analysis for longitudinal measurements of a quantitative trait.对数量性状纵向测量的分离与连锁分析。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2156-4-S1-S21.

8

Comparison of the linkage results of two phenotypic constructs from longitudinal data in the Framingham Heart Study: analyses on data measured at three time points and on the average of three measurements.弗雷明汉心脏研究中纵向数据的两种表型结构的连锁结果比较：对三个时间点测量的数据以及三次测量平均值的分析

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S20. doi: 10.1186/1471-2156-4-S1-S20.

9

Comparison of linkage analysis methods for genome-wide scanning of extended pedigrees, with application to the TG/HDL-C ratio in the Framingham Heart Study.扩展家系全基因组扫描连锁分析方法的比较及其在弗雷明汉心脏研究中TG/HDL-C比值的应用

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S93. doi: 10.1186/1471-2156-4-S1-S93.

10

Genome-wide linkage analysis of the tracking of systolic blood pressure using a mixed model.使用混合模型对收缩压追踪进行全基因组连锁分析。

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S88. doi: 10.1186/1471-2156-4-S1-S88.

引用本文的文献

1

Multiple imputation of missing phenotype data for QTL mapping.用于数量性状基因座定位的缺失表型数据的多重填补

Stat Appl Genet Mol Biol. 2011;10(1):Article 29. doi: 10.2202/1544-6115.1676.

2

A large-scale international meta-analysis of paraoxonase gene polymorphisms in sporadic ALS.散发性肌萎缩侧索硬化症中对氧磷酶基因多态性的大规模国际荟萃分析。

Neurology. 2009 Jul 7;73(1):16-24. doi: 10.1212/WNL.0b013e3181a18674. Epub 2009 Mar 25.

本文引用的文献

1

Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study.纵向研究中因失访导致的数据缺失对变化率估计量的影响：一项模拟研究。

Stat Med. 2001 Dec 30;20(24):3715-28. doi: 10.1002/sim.1114.

2

Haseman and Elston revisited.哈斯曼和埃尔斯顿再探讨。

Genet Epidemiol. 2000 Jul;19(1):1-17. doi: 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E.

3

A critical look at methods for handling missing covariates in epidemiologic regression analyses.对流行病学回归分析中处理缺失协变量方法的批判性审视。

Am J Epidemiol. 1995 Dec 15;142(12):1255-64. doi: 10.1093/oxfordjournals.aje.a117592.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。