Suppr超能文献

使用随机森林分类和回归的通路分析

Pathway analysis using random forests classification and regression.

作者信息

Pang Herbert, Lin Aiping, Holford Matthew, Enerson Bradley E, Lu Bin, Lawton Michael P, Floyd Eugenia, Zhao Hongyu

机构信息

Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA.

出版信息

Bioinformatics. 2006 Aug 15;22(16):2028-36. doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29.

Abstract

MOTIVATION

Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers.

RESULTS

In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data.

AVAILABILITY

Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.

摘要

动机

尽管已经开发了许多方法来更好地从微阵列数据中获取生物学信息,但常用的基于单基因的方法忽略了基因之间的相互作用,为其他新方法留出了空间。例如,大多数用于微阵列数据的分类和回归方法都是基于整个基因集,并且没有利用通路信息。微阵列研究中基于通路的分析可能会为生物学研究人员带来更丰富、更相关的知识。

结果

在本文中,我们描述了一种使用随机森林来分析基因表达数据的基于通路的分类和回归方法。所提出的方法允许研究人员对来自外部可用数据库的重要通路进行排名,发现重要基因,找到基于通路的异常案例,并在回归设置中充分利用连续的结果变量。我们还使用几个数据集将随机森林与其他机器学习方法进行了比较,发现随机森林的分类错误率要么是最低的,要么是第二低的。通过结合通路信息和新颖的统计方法,这个过程代表了一种在剖析通路方面很有前景的计算策略,并且可以为微阵列数据的研究提供生物学见解。

可用性

用R编写的源代码可从http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验