病例对照关联分析中的两阶段设计。

Two-stage designs in case-control association analysis.

作者信息

Zuo Yijun, Zou Guohua, Zhao Hongyu

机构信息

Department of Statistics and Probability, Michigan State University, Michigan 48824, USA.

出版信息

Genetics. 2006 Jul;173(3):1747-60. doi: 10.1534/genetics.105.042648. Epub 2006 Apr 19.

DOI:10.1534/genetics.105.042648

PMID:16624925

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1526674/

Abstract

DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.

摘要

DNA池化是一种在基因研究中收集标记等位基因频率信息的经济高效方法。它常被建议作为一种筛选工具，用于从大量标记中识别候选标记子集，以便后续通过更准确且信息丰富的个体基因分型进行跟进。在本文中，我们研究了与这种两阶段设计相关的几个统计特性和设计问题，包括用于第二阶段分析的候选标记的选择、该设计的统计功效，以及真正与疾病相关的标记在第二阶段分析后位列前茅的概率。我们得出了关于第二阶段分析要选择的标记比例的分析结果。例如，要通过1000例病例和1000例对照的初始样本检测病例与对照之间等位基因频率差异为0.05的疾病相关标记，我们的结果表明，当测量误差较小时（0.005），大约3%的标记应被选中。对于识别疾病相关标记的统计功效，我们发现与DNA池化相关的测量误差对其功效影响很小。这与单阶段池化方案形成对比，在单阶段池化方案中测量误差可能对统计功效有很大影响。至于疾病相关标记在第二阶段位列前茅的概率，我们表明，对于合理大的样本量，当病例与对照之间的等位基因频率差异不小于0.05时，即使第一阶段与DNA池化相关的误差不小，至少有一个疾病相关标记位列前茅的概率也很高。因此，以DNA池化作为筛选工具的两阶段设计在全基因组关联研究中提供了一种有效的策略，即使与DNA池化相关的测量误差不可忽略。对于任何疾病模型，我们发现所有统计结果本质上都取决于群体等位基因频率以及疾病相关标记处病例与对照之间的等位基因频率差异。无论第二阶段是使用完全独立的样本，还是包括第一阶段使用的样本和一组独立样本，一般结论都成立。

相似文献

Two-stage designs in case-control association analysis.病例对照关联分析中的两阶段设计。

Genetics. 2006 Jul;173(3):1747-60. doi: 10.1534/genetics.105.042648. Epub 2006 Apr 19.

The impacts of errors in individual genotyping and DNA pooling on association studies.个体基因分型和DNA混合样本中的误差对关联研究的影响。

Genet Epidemiol. 2004 Jan;26(1):1-10. doi: 10.1002/gepi.10277.

Optimal two-stage design for case-control association analysis incorporating genotyping errors.纳入基因分型错误的病例对照关联分析的最优两阶段设计。

Ann Hum Genet. 2008 May;72(Pt 3):375-87. doi: 10.1111/j.1469-1809.2007.00419.x. Epub 2008 Jan 23.

Optimal DNA pooling-based two-stage designs in case-control association studies.病例对照关联研究中基于最佳DNA池化的两阶段设计

Hum Hered. 2009;67(1):46-56. doi: 10.1159/000164398. Epub 2008 Oct 17.

A genome-wide scan of 1842 DNA markers for allelic associations with general cognitive ability: a five-stage design using DNA pooling and extreme selected groups.对1842个DNA标记进行全基因组扫描以寻找与一般认知能力的等位基因关联：采用DNA池和极端选择组的五阶段设计。

Behav Genet. 2001 Nov;31(6):497-509. doi: 10.1023/a:1013385125887.

Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design.对来自混合DNA的等位基因频率估计中的误差来源进行识别，可得出最优的实验设计。

Ann Hum Genet. 2002 Nov;66(Pt 5-6):393-405. doi: 10.1017/S0003480002001252.

Impact and quantification of the sources of error in DNA pooling designs.DNA混合设计中误差来源的影响及量化

Ann Hum Genet. 2009 Jan;73(1):118-24. doi: 10.1111/j.1469-1809.2008.00486.x. Epub 2008 Oct 15.

On the use of DNA pooling to estimate haplotype frequencies.关于使用DNA池来估计单倍型频率。

Genet Epidemiol. 2003 Jan;24(1):74-82. doi: 10.1002/gepi.10195.

Bayesian method for gene detection and mapping, using a case and control design and DNA pooling.用于基因检测和定位的贝叶斯方法，采用病例对照设计和DNA池化技术。

Biostatistics. 2007 Jul;8(3):546-65. doi: 10.1093/biostatistics/kxl028. Epub 2006 Sep 19.

Optimal selection strategies for QTL mapping using pooled DNA samples.使用混合DNA样本进行QTL定位的最优选择策略。

Eur J Hum Genet. 2002 Feb;10(2):125-32. doi: 10.1038/sj.ejhg.5200771.

引用本文的文献

Accurate and Efficient -value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.通过高斯近似进行准确高效的价值计算：一种新型蒙特卡罗方法。

J Am Stat Assoc. 2019;114(525):384-392. doi: 10.1080/01621459.2017.1407776. Epub 2018 Jun 28.

Genetic variation of long non-coding RNA TINCR contribute to the susceptibility and progression of colorectal cancer.长链非编码RNA TINCR的基因变异有助于结直肠癌的易感性和进展。

Oncotarget. 2017 May 16;8(20):33536-33543. doi: 10.18632/oncotarget.16538.

A generalized model to estimate the statistical power in mitochondrial disease studies involving 2×k tables.一种用于估计涉及 2×k 表的线粒体疾病研究中统计功效的广义模型。

PLoS One. 2013 Sep 27;8(9):e73567. doi: 10.1371/journal.pone.0073567. eCollection 2013.

Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA.基于合并 DNA 的联合约束稀疏表示的最大简约单倍型频率推断。

BMC Bioinformatics. 2013 Sep 8;14:270. doi: 10.1186/1471-2105-14-270.

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data.从混合 DNA 数据中快速准确估计大型单倍型向量的单倍型频率。

BMC Genet. 2012 Oct 30;13:94. doi: 10.1186/1471-2156-13-94.

MAP3K7 and GSTZ1 are associated with human longevity: a two-stage case-control study using a multilocus genotyping.丝裂原活化蛋白激酶激酶激酶7（MAP3K7）和谷胱甘肽S-转移酶Zeta 1（GSTZ1）与人类长寿相关：一项采用多位点基因分型的两阶段病例对照研究。

Age (Dordr). 2013 Aug;35(4):1357-66. doi: 10.1007/s11357-012-9416-8. Epub 2012 May 11.

Gene-based Higher Criticism methods for large-scale exonic single-nucleotide polymorphism data.用于大规模外显子单核苷酸多态性数据的基于基因的高等批评方法。

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S65. doi: 10.1186/1753-6561-5-S9-S65.

Multifactor dimensionality reduction as a filter-based approach for genome wide association studies.多因素降维法作为一种基于过滤的全基因组关联研究方法。

Front Genet. 2011 Nov 21;2:80. doi: 10.3389/fgene.2011.00080. eCollection 2011.

Genome-wide association study identifies PERLD1 as asthma candidate gene.全基因组关联研究鉴定 PERLD1 为哮喘候选基因。

BMC Med Genet. 2011 Dec 21;12:170. doi: 10.1186/1471-2350-12-170.

The efficacy of detecting variants with small effects on the Affymetrix 6.0 platform using pooled DNA.使用 DNA 池在 Affymetrix 6.0 平台上检测对小效应变异的功效。

Hum Genet. 2011 Nov;130(5):607-21. doi: 10.1007/s00439-011-0974-0. Epub 2011 Mar 22.

本文引用的文献

Family-based association tests for different family structures using pooled DNA.使用混合DNA对不同家庭结构进行基于家系的关联测试。

Ann Hum Genet. 2005 Jul;69(Pt 4):429-42. doi: 10.1046/j.1529-8817.2005.00164.x.

Two-stage designs for gene-disease association studies with sample size constraints.具有样本量限制的基因-疾病关联研究的两阶段设计。

Biometrics. 2004 Sep;60(3):589-97. doi: 10.1111/j.0006-341X.2004.00207.x.

The impacts of errors in individual genotyping and DNA pooling on association studies.个体基因分型和DNA混合样本中的误差对关联研究的影响。

Genet Epidemiol. 2004 Jan;26(1):1-10. doi: 10.1002/gepi.10277.

Optimal two-stage genotyping in population-based association studies.基于人群的关联研究中的最优两阶段基因分型

Genet Epidemiol. 2003 Sep;25(2):149-57. doi: 10.1002/gepi.10260.

Ann Hum Genet. 2002 Nov;66(Pt 5-6):393-405. doi: 10.1017/S0003480002001252.

Association testing by DNA pooling: an effective initial screen.通过DNA池进行关联测试：一种有效的初步筛选方法。

Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):16871-4. doi: 10.1073/pnas.262671399. Epub 2002 Dec 10.

DNA Pooling: a tool for large-scale association studies.DNA 池化：大规模关联研究的一种工具。

Nat Rev Genet. 2002 Nov;3(11):862-71. doi: 10.1038/nrg930.

SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis.混合DNA的单核苷酸多态性基因分型：基因分型技术比较及一种用于数据存储与分析的半自动方法

Nucleic Acids Res. 2002 Aug 1;30(15):e74. doi: 10.1093/nar/gnf070.

Two-stage designs for gene-disease association studies.基因-疾病关联研究的两阶段设计

Biometrics. 2002 Mar;58(1):163-70. doi: 10.1111/j.0006-341x.2002.00163.x.

In silico mapping of complex disease-related traits in mice.小鼠复杂疾病相关性状的计算机模拟定位

Science. 2001 Jun 8;292(5523):1915-8. doi: 10.1126/science.1058889.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。