• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用逻辑回归和随机森林对类风湿性关节炎候选基因座的单标记和成对效应进行分析。

Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests.

作者信息

Glaser Beate, Nikolov Ivan, Chubb Daniel, Hamshere Marian L, Segurado Ricardo, Moskvina Valentina, Holmans Peter

机构信息

Biostatistics and Bioinformatics Unit, and Department of Psychological Medicine, Cardiff University, School of Medicine, Heath Park, Cardiff, Wales, CF14 4XN, UK.

出版信息

BMC Proc. 2007;1 Suppl 1(Suppl 1):S54. doi: 10.1186/1753-6561-1-s1-s54. Epub 2007 Dec 18.

DOI:10.1186/1753-6561-1-s1-s54
PMID:18466554
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2367457/
Abstract

Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels.Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction. This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies. Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.

摘要

利用参数和非参数技术,我们的研究调查了遗传分析研讨会15(GAW15)北美类风湿性关节炎联盟(NARAC)候选基因数据集(问题2)中20个标记之间的单基因座和成对效应的存在情况,分析了463名独立患者和855名对照。具体而言,我们的工作检验了单基因座和成对相互作用效应的逻辑回归(LR)分析与随机森林(RF)单变量和联合重要性度量之间的对应关系。为了进行这种比较,我们选择了小而稳定的随机森林(500棵树),其重要性度量与基于5000棵树生长的随机森林的重要性度量之间显示出强相关性(r~0.98)。两种随机森林重要性度量都捕获了大部分逻辑回归单基因座和成对相互作用效应,而联合重要性度量也与包含主效应和相互作用效应的完整逻辑回归模型相对应。我们还表明,随机森林度量对数据插补特别敏感。在6q25.1上的MAP3K7IP2/SUMO4内的两个标记之间发现了对类风湿性关节炎最一致的成对效应,尽管逻辑回归和随机森林给出了不同的显著性水平。在一个假设的两阶段设计中,对所有具有显著随机森林单变量重要性的标记进行成对逻辑回归分析,将使我们小数据集中可能的组合数量减少61%,而联合重要性度量在减少标记对方面效率较低。这表明,能够检测广泛相互作用效应且计算效率非常高的随机森林单变量重要性度量,可能被用作更大规模关联研究的预筛选工具。由于随机森林不能指示高风险基因型组合,因此需要进行后续分析,例如通过逻辑回归。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83bd/2367457/9dca5d649d59/1753-6561-1-S1-S54-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83bd/2367457/c011f9fdbbe1/1753-6561-1-S1-S54-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83bd/2367457/9dca5d649d59/1753-6561-1-S1-S54-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83bd/2367457/c011f9fdbbe1/1753-6561-1-S1-S54-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83bd/2367457/9dca5d649d59/1753-6561-1-S1-S54-2.jpg

相似文献

1
Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests.使用逻辑回归和随机森林对类风湿性关节炎候选基因座的单标记和成对效应进行分析。
BMC Proc. 2007;1 Suppl 1(Suppl 1):S54. doi: 10.1186/1753-6561-1-s1-s54. Epub 2007 Dec 18.
2
Identification of genes and haplotypes that predict rheumatoid arthritis using random forests.使用随机森林识别预测类风湿性关节炎的基因和单倍型。
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S68. doi: 10.1186/1753-6561-3-s7-s68.
3
Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests.在一项类风湿性关节炎研究中使用随机森林检测显著的单核苷酸多态性。
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S69. doi: 10.1186/1753-6561-3-s7-s69.
4
Constructing gene association networks for rheumatoid arthritis using the backward genotype-trait association (BGTA) algorithm.使用反向基因型-性状关联(BGTA)算法构建类风湿性关节炎的基因关联网络。
BMC Proc. 2007;1 Suppl 1(Suppl 1):S13. doi: 10.1186/1753-6561-1-s1-s13. Epub 2007 Dec 18.
5
Picking single-nucleotide polymorphisms in forests.在森林中挑选单核苷酸多态性。
BMC Proc. 2007;1 Suppl 1(Suppl 1):S59. doi: 10.1186/1753-6561-1-s1-s59. Epub 2007 Dec 18.
6
Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests.利用随机森林法通过候选基因和全基因组单核苷酸多态性对类风湿性关节炎状态进行分类
BMC Proc. 2007;1 Suppl 1(Suppl 1):S62. doi: 10.1186/1753-6561-1-s1-s62. Epub 2007 Dec 18.
7
Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction.评估基于树的方法和逻辑回归检测单核苷酸多态性(SNP)-SNP相互作用的能力。
Ann Hum Genet. 2009 May;73(Pt 3):360-9. doi: 10.1111/j.1469-1809.2009.00511.x. Epub 2009 Mar 8.
8
The type 1 diabetes susceptibility gene SUMO4 at IDDM5 is not associated with susceptibility to rheumatoid arthritis or juvenile idiopathic arthritis.位于IDDM5的1型糖尿病易感基因SUMO4与类风湿性关节炎或青少年特发性关节炎的易感性无关。
Rheumatology (Oxford). 2005 Nov;44(11):1390-3. doi: 10.1093/rheumatology/kei041. Epub 2005 Sep 13.
9
Genetic association with rheumatoid arthritis-Genetic Analysis Workshop 15: summary of contributions from Group 2.类风湿关节炎的基因关联研究——遗传分析研讨会15:第2组研究成果总结
Genet Epidemiol. 2007;31 Suppl 1:S12-21. doi: 10.1002/gepi.20276.
10
Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle.用于识别与奶牛剩余采食量相关的加性和上位性单核苷酸多态性的随机森林方法。
J Dairy Sci. 2013 Oct;96(10):6716-29. doi: 10.3168/jds.2012-6237. Epub 2013 Aug 9.

引用本文的文献

1
An introduction to machine learning and analysis of its use in rheumatic diseases.机器学习简介及其在风湿性疾病中的应用分析。
Nat Rev Rheumatol. 2021 Dec;17(12):710-730. doi: 10.1038/s41584-021-00708-w. Epub 2021 Nov 2.
2
Random forests for genetic association studies.用于基因关联研究的随机森林算法。
Stat Appl Genet Mol Biol. 2011;10(1):32. doi: 10.2202/1544-6115.1691. Epub 2011 Jul 12.
3
An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.随机森林在全基因组关联数据集上的应用:方法学考虑与新发现。

本文引用的文献

1
Genome-wide strategies for detecting multiple loci that influence complex diseases.用于检测影响复杂疾病的多个基因座的全基因组策略。
Nat Genet. 2005 Apr;37(4):413-7. doi: 10.1038/ng1537. Epub 2005 Mar 27.
2
Identifying SNPs predictive of phenotype using random forests.使用随机森林识别预测表型的单核苷酸多态性
Genet Epidemiol. 2005 Feb;28(2):171-82. doi: 10.1002/gepi.20041.
3
Screening large-scale association study data: exploiting interactions using random forests.筛选大规模关联研究数据:利用随机森林探索相互作用
BMC Genet. 2010 Jun 14;11:49. doi: 10.1186/1471-2156-11-49.
4
Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis.监督机器学习和逻辑回归确定了 PTPN22 与类风湿关节炎的新型上位风险因素。
Genes Immun. 2010 Apr;11(3):199-208. doi: 10.1038/gene.2009.110. Epub 2010 Jan 21.
5
Analyses of multiple single-nucleotide polymorphisms in the SUMO4/IDDM5 region in affected sib-pair families with type I diabetes.对 1 型糖尿病患者同胞对家系中 SUMO4/IDDM5 区域内多个单核苷酸多态性的分析。
Genes Immun. 2009 Dec;10 Suppl 1(Suppl 1):S16-20. doi: 10.1038/gene.2009.86.
BMC Genet. 2004 Dec 10;5:32. doi: 10.1186/1471-2156-5-32.
4
Pedigree disequilibrium tests for multilocus haplotypes.多位点单倍型的系谱不平衡检验
Genet Epidemiol. 2003 Sep;25(2):115-21. doi: 10.1002/gepi.10252.
5
Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans.上位性:其含义、非含义以及在人类中检测它的统计方法。
Hum Mol Genet. 2002 Oct 1;11(20):2463-8. doi: 10.1093/hmg/11.20.2463.