Suppr
超能文献

GeneSrF和varSelRF：一个用于基因选择和分类的基于网络的工具及R包，采用随机森林方法。

GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.

作者信息

Diaz-Uriarte Ramón

机构信息

Statistical Computing Team, Structural Biology and Biocomputing Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain.

出版信息

BMC Bioinformatics. 2007 Sep 3;8:328. doi: 10.1186/1471-2105-8-328.

DOI:10.1186/1471-2105-8-328

PMID:17767709

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2034606/

Abstract

BACKGROUND

Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.

RESULTS

We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from http://genesrf2.bioinfo.cnio.es. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.

CONCLUSION

varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.

摘要

背景

微阵列数据常用于患者分类和基因选择。对于终端用户和生物医学研究人员而言，一个合适的工具应兼具用户友好性与统计严谨性，包括仔细避免选择偏差、允许对多种解决方案进行分析，以及能够获取所选基因的其他功能信息。从方法学角度来看，如果一个工具能纳入最新的计算方法并提供源代码，那么它将更有用。

结果

我们开发了基于网络的工具GeneSrF和R包varSelRF，它们在患者分类的背景下，实现了一种经过验证的方法，用于选择非常小的基因集，同时保持分类准确性。计算是并行化的，能够利用多核CPU和工作站集群。输出包括预测错误率的自展估计，以及对解决方案稳定性的评估。可点击的表格链接到每个基因的其他信息（基因本体论术语、PubMed引用、京都基因与基因组百科全书通路），并且输出可以发送到PaLS，以检查为类别预测所选基因集的PubMed参考文献、基因本体论术语、京都基因与基因组百科全书和Reactome通路特征。完整的源代码是可用的，允许对软件进行扩展。基于网络的应用程序可从http://genesrf2.bioinfo.cnio.es获取。所有源代码可从Bioinformatics.org或The Launchpad获取。R包也可从CRAN获取。

结论

varSelRF和GeneSrF实现了一种经过验证的基因选择方法，包括分类错误率的自展估计。它们是应用生物医学研究人员的宝贵工具，特别是对于微阵列数据的探索性工作。由于所使用的底层技术（并行化与基于网络的应用程序的结合），它们对生物信息学家和生物统计学家也具有方法学上的意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dfb/2034606/5e4ce6850c22/1471-2105-8-328-1.jpg

相似文献

GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.

BMC Bioinformatics. 2007 Sep 3;8:328. doi: 10.1186/1471-2105-8-328.

GeneTools--application for functional annotation and statistical hypothesis testing.

BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.

Array2BIO: from microarray expression data to functional annotation of co-regulated genes.

BMC Bioinformatics. 2006 Jun 16;7:307. doi: 10.1186/1471-2105-7-307.

Pathway analysis using random forests classification and regression.

Bioinformatics. 2006 Aug 15;22(16):2028-36. doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29.

ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W81-5. doi: 10.1093/nar/gkm257. Epub 2007 Apr 27.

Biowep: a workflow enactment portal for bioinformatics applications.

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19.

goCluster integrates statistical analysis and functional interpretation of microarray expression data.

Bioinformatics. 2005 Sep 1;21(17):3575-7. doi: 10.1093/bioinformatics/bti574. Epub 2005 Jul 14.

Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W75-80. doi: 10.1093/nar/gkm229. Epub 2007 May 8.

PathExpress: a web-based tool to identify relevant pathways in gene expression data.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W176-81. doi: 10.1093/nar/gkm261. Epub 2007 Jun 22.

Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data.

BMC Bioinformatics. 2005 Sep 28;6:239. doi: 10.1186/1471-2105-6-239.

引用本文的文献

An Integrative Multi-Omics Random Forest Framework for Robust Biomarker Discovery.

bioRxiv. 2025 Mar 6:2025.03.05.641533. doi: 10.1101/2025.03.05.641533.

Evaluating Ovarian Cancer Chemotherapy Response Using Gene Expression Data and Machine Learning.

BioMedInformatics. 2024 Jun;4(2):1396-1424. doi: 10.3390/biomedinformatics4020077. Epub 2024 May 22.

Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis.

BMC Med Inform Decis Mak. 2024 Jun 20;24(Suppl 4):175. doi: 10.1186/s12911-024-02578-0.

Development of a multivariate prediction model for antidepressant resistant depression using reward-related predictors.

Front Psychiatry. 2024 Mar 25;15:1349576. doi: 10.3389/fpsyt.2024.1349576. eCollection 2024.

High hypoxia status in pancreatic cancer is associated with multiple hallmarks of an immunosuppressive tumor microenvironment.

Front Immunol. 2024 Mar 6;15:1360629. doi: 10.3389/fimmu.2024.1360629. eCollection 2024.

High-accuracy prediction of colorectal cancer chemotherapy efficacy using machine learning applied to gene expression data.

Front Physiol. 2024 Jan 18;14:1272206. doi: 10.3389/fphys.2023.1272206. eCollection 2023.

Predictive modeling of oocyte maternal mRNA features for five mammalian species reveals potential shared and species-restricted regulators during maturation.

Physiol Genomics. 2024 Jan 1;56(1):9-31. doi: 10.1152/physiolgenomics.00048.2023. Epub 2023 Oct 16.

Differences in selected blood parameters between brachycephalic and non-brachycephalic dogs.

Front Vet Sci. 2023 Aug 15;10:1166032. doi: 10.3389/fvets.2023.1166032. eCollection 2023.

Construction and validation of a gene expression classifier to predict immunotherapy response in primary triple-negative breast cancer.

Commun Med (Lond). 2023 Jul 10;3(1):93. doi: 10.1038/s43856-023-00311-y.

A study of differential microRNA expression profile in migraine: the microMIG exploratory study.

J Headache Pain. 2023 Feb 17;24(1):11. doi: 10.1186/s10194-023-01542-z.

本文引用的文献

Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W75-80. doi: 10.1093/nar/gkm229. Epub 2007 May 8.

Bias in random forest variable importance measures: illustrations, sources and a solution.

BMC Bioinformatics. 2007 Jan 25;8:25. doi: 10.1186/1471-2105-8-25.

IDconverter and IDClight: conversion and annotation of gene and protein IDs.

BMC Bioinformatics. 2007 Jan 10;8:9. doi: 10.1186/1471-2105-8-9.

Prophet, a web-based tool for class prediction using microarray data.

Bioinformatics. 2007 Feb 1;23(3):390-1. doi: 10.1093/bioinformatics/btl602. Epub 2006 Nov 30.

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

Gene selection and classification of microarray data using random forest.

BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.

caGEDA: a web application for the integrated analysis of global gene expression patterns in cancer.

Appl Bioinformatics. 2004;3(1):49-62. doi: 10.2165/00822942-200403010-00007.

Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays.

Proc Natl Acad Sci U S A. 2005 Jun 21;102(25):8961-5. doi: 10.1073/pnas.0502674102. Epub 2005 Jun 10.

M@CBETH: a microarray classification benchmarking tool.

Bioinformatics. 2005 Jul 15;21(14):3185-6. doi: 10.1093/bioinformatics/bti495. Epub 2005 May 12.

Prediction of cancer outcome with microarrays: a multiple random validation strategy.

Lancet. 2005;365(9458):488-92. doi: 10.1016/S0140-6736(05)17866-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

GeneSrF和varSelRF：一个用于基因选择和分类的基于网络的工具及R包，采用随机森林方法。

GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译