基于预测误差的双样本比较及其在候选基因关联研究中的应用。

Two-sample comparison based on prediction error, with applications to candidate gene association studies.

作者信息

Yu K, Martin R, Rothman N, Zheng T, Lan Q

机构信息

Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.

出版信息

Ann Hum Genet. 2007 Jan;71(Pt 1):107-18. doi: 10.1111/j.1469-1809.2006.00306.x.

DOI:10.1111/j.1469-1809.2006.00306.x

PMID:17227481

Abstract

To take advantage of the increasingly available high-density SNP maps across the genome, various tests that compare multilocus genotypes or estimated haplotypes between cases and controls have been developed for candidate gene association studies. Here we view this two-sample testing problem from the perspective of supervised machine learning and propose a new association test. The approach adopts the flexible and easy-to-understand classification tree model as the learning machine, and uses the estimated prediction error of the resulting prediction rule as the test statistic. This procedure not only provides an association test but also generates a prediction rule that can be useful in understanding the mechanisms underlying complex disease. Under the set-up of a haplotype-based transmission/disequilibrium test (TDT) type of analysis, we find through simulation studies that the proposed procedure has the correct type I error rates and is robust to population stratification. The power of the proposed procedure is sensitive to the chosen prediction error estimator. Among commonly used prediction error estimators, the .632+ estimator results in a test that has the best overall performance. We also find that the test using the .632+ estimator is more powerful than the standard single-point TDT analysis, the Pearson's goodness-of-fit test based on estimated haplotype frequencies, and two haplotype-based global tests implemented in the genetic analysis package FBAT. To illustrate the application of the proposed method in population-based association studies, we use the procedure to study the association between non-Hodgkin lymphoma and the IL10 gene.

摘要

为了利用全基因组中日益可得的高密度单核苷酸多态性（SNP）图谱，针对候选基因关联研究，已开发出各种比较病例组和对照组多位点基因型或估计单倍型的检验方法。在此，我们从监督机器学习的角度审视这个两样本检验问题，并提出一种新的关联检验方法。该方法采用灵活且易于理解的分类树模型作为学习机器，并将所得预测规则的估计预测误差用作检验统计量。此过程不仅提供了一种关联检验，还生成了一个预测规则，这对于理解复杂疾病的潜在机制可能是有用的。在基于单倍型的传递/不平衡检验（TDT）类型的分析设置下，我们通过模拟研究发现，所提出的方法具有正确的I型错误率，并且对群体分层具有稳健性。所提出方法的功效对所选的预测误差估计器敏感。在常用的预测误差估计器中，.632 +估计器导致的检验具有最佳的整体性能。我们还发现，使用.632 +估计器的检验比标准的单点TDT分析、基于估计单倍型频率的Pearson拟合优度检验以及遗传分析软件包FBAT中实现的两种基于单倍型的全局检验更具功效。为了说明所提出方法在基于人群的关联研究中的应用，我们使用该方法研究非霍奇金淋巴瘤与IL10基因之间的关联。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于预测误差的双样本比较及其在候选基因关联研究中的应用。

Two-sample comparison based on prediction error, with applications to candidate gene association studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

基于预测误差的双样本比较及其在候选基因关联研究中的应用。

Two-sample comparison based on prediction error, with applications to candidate gene association studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献