使用 R 构建具有简单高效拒绝选项的基因表达谱分类器。

Building gene expression profile classifiers with a simple and efficient rejection option in R.

机构信息

Control and Computer Engineering Department, Politecnico di Torino, Corso Duca degli Abruzzi 24,10129, Torino, Italy.

出版信息

BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S3. doi: 10.1186/1471-2105-12-S13-S3. Epub 2011 Nov 30.

DOI:10.1186/1471-2105-12-S13-S3

PMID:22373214

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3278843/

Abstract

BACKGROUND

The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers.

RESULTS

This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention.

CONCLUSIONS

This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.

摘要

背景

从 DNA 微阵列中收集基因表达谱，并使用模式识别算法对其进行分析，这是一种应用于多个生物学问题的强大技术。常见的模式识别系统通过将样本分配给一组已知的类别来对样本进行分类。然而，在临床诊断环境中，可能会出现新的未知类别（新的病理），因此必须能够拒绝那些不符合训练模型的样本。在多类分类器中实现拒绝选项的问题在统计文献中尚未得到广泛解决。基因表达谱是一个关键的案例研究，因为它们受到维度诅咒问题的影响，这对传统的拒绝模型以及最近的方法（如单类分类器）的可靠性产生了负面影响。

结果

本文提出了一组经验决策规则，可用于在一组广泛用于分析基因表达谱的多类分类器中实现拒绝选项。特别是，我们专注于 R 语言和环境中的分类器实现（在本文的其余部分中简称 R）。所提出规则的主要贡献在于其简单性，这使得它们可以轻松集成到可用的数据分析环境中。由于在拒绝模型的定义中，涉及参数的调整通常是一项复杂而微妙的任务，因此在本文中，我们利用进化策略来自动化该过程。这允许最终用户以最小的人工干预最大程度地提高拒绝准确性。

结论

本文展示了如何使用简单的决策规则来帮助在实际实验设置中使用复杂的机器学习算法。所提出的方法几乎完全自动化，因此非常适合集成到缺乏传统分类器调优所需的机器学习专业知识的实验室的数据分析流程中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ad/3278843/6b55807d1add/1471-2105-12-S13-S3-1.jpg

相似文献

Building gene expression profile classifiers with a simple and efficient rejection option in R.

BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S3. doi: 10.1186/1471-2105-12-S13-S3. Epub 2011 Nov 30.

Instance-based concept learning from multiclass DNA microarray data.

BMC Bioinformatics. 2006 Feb 16;7:73. doi: 10.1186/1471-2105-7-73.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Simple decision rules for classifying human cancers from gene expression profiles.

Bioinformatics. 2005 Oct 15;21(20):3896-904. doi: 10.1093/bioinformatics/bti631. Epub 2005 Aug 16.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Classification with reject option in gene expression data.

Bioinformatics. 2008 Sep 1;24(17):1889-95. doi: 10.1093/bioinformatics/btn349. Epub 2008 Jul 10.

Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.

J Biomed Inform. 2004 Aug;37(4):269-84. doi: 10.1016/j.jbi.2004.07.007.

Reliable classification of two-class cancer data using evolutionary algorithms.

Biosystems. 2003 Nov;72(1-2):111-29. doi: 10.1016/s0303-2647(03)00138-2.

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

引用本文的文献

Elucidating prognosis in cervical squamous cell carcinoma and endocervical adenocarcinoma: a novel anoikis-related gene signature model.

Front Oncol. 2024 Jun 26;14:1352638. doi: 10.3389/fonc.2024.1352638. eCollection 2024.

SMAD-6, -7 and -9 are potential molecular biomarkers for the prognosis in human lung cancer.

Oncol Lett. 2020 Sep;20(3):2633-2644. doi: 10.3892/ol.2020.11851. Epub 2020 Jul 9.

Expression of cytosolic peroxiredoxins in Plasmodium berghei ookinetes is regulated by environmental factors in the mosquito bloodmeal.

PLoS Pathog. 2013 Jan;9(1):e1003136. doi: 10.1371/journal.ppat.1003136. Epub 2013 Jan 31.

Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference.

BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S1. doi: 10.1186/1471-2105-12-S13-S1. Epub 2011 Nov 30.

本文引用的文献

Enriching for correct prediction of biological processes using a combination of diverse classifiers.

BMC Bioinformatics. 2011 May 23;12:189. doi: 10.1186/1471-2105-12-189.

Microarray data analysis and mining tools.

Bioinformation. 2011 Apr 22;6(3):95-9. doi: 10.6026/97320630006095.

Classification with correlated features: unreliability of feature ranking and solutions.

Bioinformatics. 2011 Jul 15;27(14):1986-94. doi: 10.1093/bioinformatics/btr300. Epub 2011 May 16.

A comparison of machine learning techniques for survival prediction in breast cancer.

BioData Min. 2011 May 11;4:12. doi: 10.1186/1756-0381-4-12.

Application of the Bayesian MMSE estimator for classification error to gene expression microarray data.

Bioinformatics. 2011 Jul 1;27(13):1822-31. doi: 10.1093/bioinformatics/btr272. Epub 2011 May 5.

A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):577-91. doi: 10.1109/TCBB.2010.90.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Gene-expression profiling identifies distinct subclasses of core binding factor acute myeloid leukemia.

Blood. 2007 Aug 15;110(4):1291-300. doi: 10.1182/blood-2006-10-049783. Epub 2007 May 7.

Machine learning in bioinformatics.

Brief Bioinform. 2006 Mar;7(1):86-112. doi: 10.1093/bib/bbk007.

Cell-type specific gene expression profiles of leukocytes in human peripheral blood.

BMC Genomics. 2006 May 16;7:115. doi: 10.1186/1471-2164-7-115.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 R 构建具有简单高效拒绝选项的基因表达谱分类器。

Building gene expression profile classifiers with a simple and efficient rejection option in R.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献