病例对照关联研究中次级表型数据的恰当分析。

Proper analysis of secondary phenotype data in case-control association studies.

作者信息

Lin D Y, Zeng D

机构信息

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA.

出版信息

Genet Epidemiol. 2009 Apr;33(3):256-65. doi: 10.1002/gepi.20377.

DOI:10.1002/gepi.20377

PMID:19051285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2684820/

Abstract

Case-control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case-control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least-squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case-control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case-control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case-control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false-positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website.

摘要

病例对照关联研究通常会收集关于次要表型的大量信息，次要表型是指除病例对照状态之外的定量或定性特征。探索次要表型能够为生物途径提供有价值的见解，并识别影响直接感兴趣表型的基因变异。所有关于次要表型的出版物都使用了标准统计方法，如针对定量特征的最小二乘回归。由于病例组和对照组之间的选择概率不相等，病例对照样本并非来自一般人群的随机样本。因此，对次要表型数据进行标准统计分析可能会产生极大的误导。尽管可以通过分别分析病例组和对照组，或者在模型中纳入病例对照状态作为协变量来避免抽样偏差，但次要表型与病例组和对照组中基因变异之间的关联可能与一般人群中的关联有很大不同。在本文中，我们提出了新的统计方法，这些方法在分析次要表型数据时能恰当地反映病例对照抽样情况。新方法在最大化统计功效的同时，能提供无偏的遗传效应估计以及对假阳性率的精确控制。我们通过分析和数值模拟展示了标准方法的缺陷以及新方法的优势。相关软件可在我们的网站获取。

相似文献

Proper analysis of secondary phenotype data in case-control association studies.

Genet Epidemiol. 2009 Apr;33(3):256-65. doi: 10.1002/gepi.20377.

A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies.

Biostatistics. 2012 Jul;13(3):497-508. doi: 10.1093/biostatistics/kxr025. Epub 2011 Sep 19.

A novel association test for multiple secondary phenotypes from a case-control GWAS.

Genet Epidemiol. 2017 Jul;41(5):413-426. doi: 10.1002/gepi.22045. Epub 2017 Apr 10.

Estimation of odds ratios of genetic variants for the secondary phenotypes associated with primary diseases.

Genet Epidemiol. 2011 Apr;35(3):190-200. doi: 10.1002/gepi.20568. Epub 2011 Feb 9.

Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome-wide association studies.

Genet Epidemiol. 2010 Jul;34(5):427-33. doi: 10.1002/gepi.20495.

Powerful Rare-Variant Association Analysis of Secondary Phenotypes.

Genet Epidemiol. 2025 Jan;49(1):e22589. doi: 10.1002/gepi.22589. Epub 2024 Sep 30.

Analysis of secondary phenotypes in multigroup association studies.

Biometrics. 2020 Jun;76(2):606-618. doi: 10.1111/biom.13157. Epub 2019 Nov 11.

A cautionary note on using secondary phenotypes in neuroimaging genetic studies.

Neuroimage. 2015 Nov 1;121:136-45. doi: 10.1016/j.neuroimage.2015.07.058. Epub 2015 Jul 26.

Testing Hardy-Weinberg proportions in a frequency-matched case-control genetic association study.

PLoS One. 2011;6(11):e27642. doi: 10.1371/journal.pone.0027642. Epub 2011 Nov 14.

Robust analysis of secondary phenotypes in case-control genetic association studies.

Stat Med. 2016 Oct 15;35(23):4226-37. doi: 10.1002/sim.6976. Epub 2016 May 30.

引用本文的文献

Metabolomic evaluation of air pollution-related bone damage and potential mediation in Women's Health Initiative participants.

J Bone Miner Res. 2025 Jun 25;40(7):834-846. doi: 10.1093/jbmr/zjaf059.

Estimating Causal Effects on a Disease Progression Trait Using Bivariate Mendelian Randomisation.

Genet Epidemiol. 2025 Jan;49(1):e22600. doi: 10.1002/gepi.22600. Epub 2024 Oct 24.

Use of nonsteroidal anti-inflammatory drugs and poor olfaction in women.

Int Forum Allergy Rhinol. 2024 Mar;14(3):639-650. doi: 10.1002/alr.23241. Epub 2023 Aug 7.

HostSeq: a Canadian whole genome sequencing and clinical data resource.

BMC Genom Data. 2023 May 2;24(1):26. doi: 10.1186/s12863-023-01128-3.

Regression Reconstruction from a Retrospective Sample.

Econom Stat. 2023 Jan;25:87-92. doi: 10.1016/j.ecosta.2020.10.003.

Adjusting for collider bias in genetic association studies using instrumental variable methods.

Genet Epidemiol. 2022 Jul;46(5-6):303-316. doi: 10.1002/gepi.22455. Epub 2022 May 18.

Efficient estimation of indirect effects in case-control studies using a unified likelihood framework.

Stat Med. 2022 Jul 10;41(15):2879-2893. doi: 10.1002/sim.9390. Epub 2022 Mar 30.

Estimating the natural indirect effect and the mediation proportion via the product method.

BMC Med Res Methodol. 2021 Nov 20;21(1):253. doi: 10.1186/s12874-021-01425-4.

A COPULA-MODEL BASED SEMIPARAMETRIC INTERACTION TEST UNDER THE CASE-CONTROL DESIGN.

Stat Sin. 2013 Oct;23(4):1505-1521. doi: 10.5705/ss.2012.013s.

A hybrid parametric and empirical likelihood model for evaluating interactions in case-control Studies.

Stat Interface. 2016;9(2):147-158. doi: 10.4310/sii.2016.v9.n2.a3. Epub 2015 Nov 4.

本文引用的文献

Common variants near MC4R are associated with fat mass, weight and risk of obesity.

Nat Genet. 2008 Jun;40(6):768-75. doi: 10.1038/ng.140. Epub 2008 May 4.

Genome-wide association analysis identifies 20 loci that influence adult height.

Nat Genet. 2008 May;40(5):575-83. doi: 10.1038/ng.121. Epub 2008 Apr 6.

Many sequence variants affecting diversity of adult human height.

Nat Genet. 2008 May;40(5):609-15. doi: 10.1038/ng.122. Epub 2008 Apr 6.

Identification of ten loci associated with height highlights new biological pathways in human growth.

Nat Genet. 2008 May;40(5):584-91. doi: 10.1038/ng.125. Epub 2008 Apr 6.

Common variants in the GDF5-UQCC region are associated with variation in human height.

Nat Genet. 2008 Feb;40(2):198-203. doi: 10.1038/ng.74. Epub 2008 Jan 13.

Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans.

Nat Genet. 2008 Feb;40(2):189-97. doi: 10.1038/ng.75. Epub 2008 Jan 13.

Newly identified loci that influence lipid concentrations and risk of coronary artery disease.

Nat Genet. 2008 Feb;40(2):161-9. doi: 10.1038/ng.76. Epub 2008 Jan 13.

A common variant of HMGA2 is associated with adult and childhood height in the general population.

Nat Genet. 2007 Oct;39(10):1245-50. doi: 10.1038/ng2121. Epub 2007 Sep 2.

Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.

Science. 2007 Jun 1;316(5829):1331-6. doi: 10.1126/science.1142358. Epub 2007 Apr 26.

A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.

Science. 2007 May 11;316(5826):889-94. doi: 10.1126/science.1141634. Epub 2007 Apr 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

病例对照关联研究中次级表型数据的恰当分析。

Proper analysis of secondary phenotype data in case-control association studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献