基于比例重叠得分的功能基因组学实验分类特征选择方法。

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score.

机构信息

Department of Mathematical Sciences, University of Essex, Wivenhoe Park, CO4 3SQ Colchester, UK.

出版信息

BMC Bioinformatics. 2014 Aug 11;15(1):274. doi: 10.1186/1471-2105-15-274.

DOI:10.1186/1471-2105-15-274

PMID:25113817

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4141116/

Abstract

BACKGROUND

Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.

RESULTS

We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.

CONCLUSIONS

A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes.

摘要

背景

微阵列技术以及其他功能基因组学实验允许在每个样本中同时测量数千个基因。通过仅基于选定的有区别的基因进行分类，可以提高分类器的预测准确性和可解释性。我们提出了一种基于跨类表达数据重叠分析的基因选择的统计方法。该方法产生了一种新的度量标准，称为比例重叠得分（POS），用于衡量特征与分类任务的相关性。

结果

我们将 POS 与四种广泛使用的基因选择方法一起应用于几个基准基因表达数据集。使用随机森林、k 近邻和支持向量机分类器计算的分类错误率的实验结果表明，POS 实现了更好的性能。

结论

提出了一种新的基因选择方法 POS。POS 分析了跨类的表达重叠，同时考虑了重叠样本的比例。它为每个基因稳健地定义了一个掩模，以最小化表达异常值的影响。所构建的掩模与新的基因得分一起用于生成选定的基因子集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2976/4141116/74fcc36cb3e2/12859_2014_6543_Fig1_HTML.jpg

相似文献

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score.

BMC Bioinformatics. 2014 Aug 11;15(1):274. doi: 10.1186/1471-2105-15-274.

Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.

PeerJ Comput Sci. 2021 Jun 1;7:e562. doi: 10.7717/peerj-cs.562. eCollection 2021.

Feature weight estimation for gene selection: a local hyperlinear learning approach.

BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

Regulatory genes identification within functional genomics experiments for tissue classification into binary classes via machine learning techniques.

J Pak Med Assoc. 2020 Dec;70(12(B)):2356-2362. doi: 10.47391/JPMA.201.

Optimal combination of feature selection and classification via local hyperplane based learning strategy.

BMC Bioinformatics. 2015 Jul 10;16:219. doi: 10.1186/s12859-015-0629-6.

Microarray-based cancer prediction using single genes.

BMC Bioinformatics. 2011 Oct 7;12:391. doi: 10.1186/1471-2105-12-391.

Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine.

J Theor Biol. 2019 Feb 21;463:77-91. doi: 10.1016/j.jtbi.2018.12.010. Epub 2018 Dec 8.

A novel random forests-based feature selection method for microarray expression data analysis.

Int J Data Min Bioinform. 2015;13(1):84-101. doi: 10.1504/ijdmb.2015.070852.

Classification and Clustering on Microarray Data for Gene Functional Prediction Using R.

Methods Mol Biol. 2016;1375:41-54. doi: 10.1007/7651_2015_240.

A Gene Selection Method for Microarray Data Based on Binary PSO Encoding Gene-to-Class Sensitivity Information.

IEEE/ACM Trans Comput Biol Bioinform. 2017 Jan-Feb;14(1):85-96. doi: 10.1109/TCBB.2015.2465906.

引用本文的文献

Double weighted k nearest neighbours for binary classification of high dimensional genomic data.

Sci Rep. 2025 Apr 12;15(1):12681. doi: 10.1038/s41598-025-97505-2.

Robust vs. Non-robust radiomic features: the quest for optimal machine learning models using phantom and clinical studies.

Cancer Imaging. 2025 Mar 12;25(1):33. doi: 10.1186/s40644-025-00857-1.

Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data.

Heliyon. 2024 Sep 30;10(19):e38547. doi: 10.1016/j.heliyon.2024.e38547. eCollection 2024 Oct 15.

An intelligent dynamic cyber physical system threat detection system for ensuring secured communication in 6G autonomous vehicle networks.

Sci Rep. 2024 Sep 5;14(1):20795. doi: 10.1038/s41598-024-70835-3.

Tumor-suppressive function of EZH2 is through inhibiting glutaminase.

Cell Death Dis. 2021 Oct 20;12(11):975. doi: 10.1038/s41419-021-04212-7.

Prediction of Multidrug-Resistant Tuberculosis Using Machine Learning Algorithms in SWAT, Pakistan.

J Healthc Eng. 2021 Aug 31;2021:2567080. doi: 10.1155/2021/2567080. eCollection 2021.

Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.

PeerJ Comput Sci. 2021 Jun 1;7:e562. doi: 10.7717/peerj-cs.562. eCollection 2021.

Ensemble of a subset of NN classifiers.

Adv Data Anal Classif. 2018;12(4):827-840. doi: 10.1007/s11634-015-0227-5. Epub 2016 Jan 22.

Data on clinical significance of GAS2 in colorectal cancer cells.

Data Brief. 2016 May 11;8:82-6. doi: 10.1016/j.dib.2016.05.010. eCollection 2016 Sep.

本文引用的文献

Quantitative proteome profiling of lymph node-positive vs. -negative colorectal carcinomas pinpoints MX1 as a marker for lymph node metastasis.

Int J Cancer. 2014 Dec 15;135(12):2878-86. doi: 10.1002/ijc.28929. Epub 2014 May 12.

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm.

BMC Bioinformatics. 2014 Feb 20;15:49. doi: 10.1186/1471-2105-15-49.

A computational study identifies HIV progression-related genes using mRMR and shortest path tracing.

PLoS One. 2013 Nov 11;8(11):e78057. doi: 10.1371/journal.pone.0078057. eCollection 2013.

Comparison of feature selection methods for cross-laboratory microarray analysis.

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):593-604. doi: 10.1109/TCBB.2013.70.

mRMRe: an R package for parallelized mRMR ensemble feature selection.

Bioinformatics. 2013 Sep 15;29(18):2365-8. doi: 10.1093/bioinformatics/btt383. Epub 2013 Jul 3.

Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition.

BMC Bioinformatics. 2013 Mar 20;14:101. doi: 10.1186/1471-2105-14-101.

An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis.

BMC Bioinformatics. 2013 Mar 9;14:90. doi: 10.1186/1471-2105-14-90.

Identification of NUCKS1 as a colorectal cancer prognostic marker through integrated expression and copy number analysis.

Int J Cancer. 2013 May 15;132(10):2295-302. doi: 10.1002/ijc.27911. Epub 2012 Nov 5.

Ultrahigh dimensional feature selection: beyond the linear model.

J Mach Learn Res. 2009;10:2013-2038.

Candidate driver genes in microsatellite-unstable colorectal cancer.

Int J Cancer. 2012 Apr 1;130(7):1558-66. doi: 10.1002/ijc.26167. Epub 2011 Aug 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于比例重叠得分的功能基因组学实验分类特征选择方法。

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献