用于多位点数量性状连锁分析的EM随机森林及变量重要性新度量

EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis.

作者信息

Lee Sophia S F, Sun Lei, Kustra Rafal, Bull Shelley B

机构信息

Department of Public Health Sciences, University of Toronto, Toronto M5T3M7, Canada.

出版信息

Bioinformatics. 2008 Jul 15;24(14):1603-10. doi: 10.1093/bioinformatics/btn239. Epub 2008 May 21.

DOI:10.1093/bioinformatics/btn239

PMID:18499695

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2638262/

Abstract

MOTIVATION

We developed an EM-random forest (EMRF) for Haseman-Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data.

RESULTS

Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman-Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs.

AVAILABILITY

The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF.

摘要

动机

我们开发了一种用于哈斯曼 - 埃尔斯顿数量性状连锁分析的EM随机森林（EMRF），它考虑了标记的模糊性，并根据后裔相同的后验概率（IBD）分布对每个同胞对进行加权。用于为变量选择对标记进行排名的常用随机森林（RF）变量重要性（VI）指数在应用于连锁数据时并非最优，因为标记之间存在相关性。我们定义了新的VI指数，利用IBD连锁数据中固有的相关结构从连锁标记中借用信息。

结果

通过模拟，我们发现EMRF中的新VI指数比原始RF VI指数表现更好，并且在各种遗传模型下，其表现与EM - 哈斯曼 - 埃尔斯顿回归LOD得分相似或更好。此外，在随机森林中，每个节点评估的树大小和标记子集大小是重要的考虑因素。

可用性

用C编写的EMRF的源代码可在www.infornomics.utoronto.ca/downloads/EMRF获取。

相似文献

EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis.

Bioinformatics. 2008 Jul 15;24(14):1603-10. doi: 10.1093/bioinformatics/btn239. Epub 2008 May 21.

Weighting improves the "new Haseman-Elston" method.

Hum Hered. 2001;52(1):47-54. doi: 10.1159/000053353.

X-linked extension of the revised Haseman-Elston algorithm for linkage analysis in sib pairs.

Hum Hered. 2003;55(2-3):97-107. doi: 10.1159/000072314.

A unified Haseman-Elston method for testing linkage with quantitative traits.

Am J Hum Genet. 2000 Oct;67(4):1025-8. doi: 10.1086/303081. Epub 2000 Aug 28.

Transformation of sib-pair values for the Haseman-Elston method.

Am J Hum Genet. 2001 May;68(5):1238-49. doi: 10.1086/320101. Epub 2001 Apr 17.

Equivalence between Haseman-Elston and variance-components linkage analyses for sib pairs.

Am J Hum Genet. 2001 Jun;68(6):1527-32. doi: 10.1086/320593. Epub 2001 May 14.

Parameter estimation and quantitative parametric linkage analysis with GENEHUNTER-QMOD.

Hum Hered. 2012;73(4):208-19. doi: 10.1159/000339904. Epub 2012 Aug 19.

Quantitative trait linkage analysis by generalized estimating equations: unification of variance components and Haseman-Elston regression.

Genet Epidemiol. 2004 May;26(4):265-72. doi: 10.1002/gepi.10315.

Should we consider gene x environment interaction in the hunt for quantitative trait loci?

Genet Epidemiol. 2001;21 Suppl 1:S831-6. doi: 10.1002/gepi.2001.21.s1.s831.

Genetic linkage methods for quantitative traits.

Stat Methods Med Res. 2001 Feb;10(1):3-25. doi: 10.1177/096228020101000102.

引用本文的文献

Application of Artificial Intelligence in Screening for Adverse Perinatal Outcomes-A Systematic Review.

Healthcare (Basel). 2022 Oct 29;10(11):2164. doi: 10.3390/healthcare10112164.

An enhanced machine learning tool for cis-eQTL mapping with regularization and confounder adjustments.

Genet Epidemiol. 2020 Nov;44(8):798-810. doi: 10.1002/gepi.22341. Epub 2020 Jul 22.

Application of data mining for predicting hemodynamics instability during pheochromocytoma surgery.

BMC Med Inform Decis Mak. 2020 Jul 20;20(1):165. doi: 10.1186/s12911-020-01180-4.

An experimental study of the intrinsic stability of random forest variable importance measures.

BMC Bioinformatics. 2016 Feb 3;17:60. doi: 10.1186/s12859-016-0900-5.

A novel targeted learning method for quantitative trait loci mapping.

Genetics. 2014 Dec;198(4):1369-76. doi: 10.1534/genetics.114.168955. Epub 2014 Sep 24.

Impact of natural genetic variation on gene expression dynamics.

PLoS Genet. 2013 Jun;9(6):e1003514. doi: 10.1371/journal.pgen.1003514. Epub 2013 Jun 6.

Random forests for genetic association studies.

Stat Appl Genet Mol Biol. 2011;10(1):32. doi: 10.2202/1544-6115.1691. Epub 2011 Jul 12.

Data-driven assessment of eQTL mapping methods.

BMC Genomics. 2010 Sep 17;11:502. doi: 10.1186/1471-2164-11-502.

Genome-wide strategies for discovering genetic influences on cognition and cognitive disorders: methodological considerations.

Cogn Neuropsychiatry. 2009;14(4-5):391-418. doi: 10.1080/13546800903059829.

本文引用的文献

Locus-specific heritability estimation via the bootstrap in linkage scans for quantitative trait loci.

Hum Hered. 2006;62(2):84-96. doi: 10.1159/000096096. Epub 2006 Oct 12.

Relating HIV-1 sequence variation to replication capacity via trees and forests.

Stat Appl Genet Mol Biol. 2004;3:Article2; discussion article 7, article 9. doi: 10.2202/1544-6115.1031. Epub 2004 Feb 12.

ROCR: visualizing classifier performance in R.

Bioinformatics. 2005 Oct 15;21(20):3940-1. doi: 10.1093/bioinformatics/bti623. Epub 2005 Aug 11.

Two-level Haseman-Elston regression for general pedigree data analysis.

Genet Epidemiol. 2005 Jul;29(1):12-22. doi: 10.1002/gepi.20075.

Identifying SNPs predictive of phenotype using random forests.

Genet Epidemiol. 2005 Feb;28(2):171-82. doi: 10.1002/gepi.20041.

Screening large-scale association study data: exploiting interactions using random forests.

BMC Genet. 2004 Dec 10;5:32. doi: 10.1186/1471-2156-5-32.

Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma.

Mod Pathol. 2005 Apr;18(4):547-57. doi: 10.1038/modpathol.3800322.

Application of the random forest classification algorithm to a SELDI-TOF proteomics study in the setting of a cancer prevention trial.

Ann N Y Acad Sci. 2004 May;1020:154-74. doi: 10.1196/annals.1310.015.

Quantitative trait linkage analysis by generalized estimating equations: unification of variance components and Haseman-Elston regression.

Genet Epidemiol. 2004 May;26(4):265-72. doi: 10.1002/gepi.10315.

Mapping complex traits using Random Forests.

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S64. doi: 10.1186/1471-2156-4-S1-S64.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于多位点数量性状连锁分析的EM随机森林及变量重要性新度量

EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis.

作者信息

Lee Sophia S F, Sun Lei, Kustra Rafal, Bull Shelley B

机构信息

Department of Public Health Sciences, University of Toronto, Toronto M5T3M7, Canada.

出版信息

Bioinformatics. 2008 Jul 15;24(14):1603-10. doi: 10.1093/bioinformatics/btn239. Epub 2008 May 21.

DOI:10.1093/bioinformatics/btn239

PMID:18499695

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2638262/

Abstract

MOTIVATION

RESULTS

AVAILABILITY

The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF.

摘要

动机

结果

可用性

用C编写的EMRF的源代码可在www.infornomics.utoronto.ca/downloads/EMRF获取。

用于多位点数量性状连锁分析的EM随机森林及变量重要性新度量

EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于多位点数量性状连锁分析的EM随机森林及变量重要性新度量

EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性