• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于基因表达数据分类的高效统计特征选择方法。

An efficient statistical feature selection approach for classification of gene expression data.

机构信息

Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.

出版信息

J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.

DOI:10.1016/j.jbi.2011.01.001
PMID:21241823
Abstract

Classification of gene expression data plays a significant role in prediction and diagnosis of diseases. Gene expression data has a special characteristic that there is a mismatch in gene dimension as opposed to sample dimension. All genes do not contribute for efficient classification of samples. A robust feature selection algorithm is required to identify the important genes which help in classifying the samples efficiently. In order to select informative genes (features) based on relevance and redundancy characteristics, many feature selection algorithms have been introduced in the past. Most of the earlier algorithms require computationally expensive search strategy to find an optimal feature subset. Existing feature selection methods are also sensitive to the evaluation measures. The paper introduces a novel and efficient feature selection approach based on statistically defined effective range of features for every class termed as ERGS (Effective Range based Gene Selection). The basic principle behind ERGS is that higher weight is given to the feature that discriminates the classes clearly. Experimental results on well-known gene expression datasets illustrate the effectiveness of the proposed approach. Two popular classifiers viz. Nave Bayes Classifier (NBC) and Support Vector Machine (SVM) have been used for classification. The proposed feature selection algorithm can be helpful in ranking the genes and also is capable of identifying the most relevant genes responsible for diseases like leukemia, colon tumor, lung cancer, diffuse large B-cell lymphoma (DLBCL), prostate cancer.

摘要

基因表达数据的分类在疾病的预测和诊断中起着重要的作用。基因表达数据有一个特殊的特征,即基因维度与样本维度不匹配。并非所有基因都有助于对样本进行有效的分类。需要一种强大的特征选择算法来识别重要基因,以有效地对样本进行分类。为了根据相关性和冗余性特征选择信息性基因(特征),过去已经引入了许多特征选择算法。大多数早期算法需要计算成本高昂的搜索策略来找到最佳特征子集。现有的特征选择方法也对评估措施很敏感。本文提出了一种新颖有效的特征选择方法,该方法基于对每个类别定义的统计有效特征范围,称为 ERGS(基于有效范围的基因选择)。ERGS 的基本原理是,对能更清晰地区分类别的特征赋予更高的权重。在著名的基因表达数据集上的实验结果说明了所提出方法的有效性。已经使用了两种流行的分类器,即朴素贝叶斯分类器(NBC)和支持向量机(SVM)进行分类。所提出的特征选择算法可以帮助对基因进行排序,并且还能够识别出导致白血病、结肠癌、肺癌、弥漫性大 B 细胞淋巴瘤(DLBCL)、前列腺癌等疾病的最相关基因。

相似文献

1
An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
2
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。
Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.
3
Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.
4
SVM-RFE with MRMR filter for gene selection.基于 MRMR 滤波器的 SVM-RFE 基因选择方法。
IEEE Trans Nanobioscience. 2010 Mar;9(1):31-7. doi: 10.1109/TNB.2009.2035284. Epub 2009 Oct 30.
5
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
6
The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification.用于疾病分类的高维基因表达数据特征选择的蚁群算法
Math Med Biol. 2007 Dec;24(4):413-26. doi: 10.1093/imammb/dqn001. Epub 2008 Feb 22.
7
Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。
Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.
8
Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.利用群体智能特征选择算法发现紧凑型癌症生物标志物。
Comput Biol Chem. 2010 Aug;34(4):244-50. doi: 10.1016/j.compbiolchem.2010.08.003. Epub 2010 Sep 9.
9
Tumor classification based on non-negative matrix factorization using gene expression data.基于基因表达数据的非负矩阵分解的肿瘤分类。
IEEE Trans Nanobioscience. 2011 Jun;10(2):86-93. doi: 10.1109/TNB.2011.2144998. Epub 2011 Jul 7.
10
Cancer classification and prediction using logistic regression with Bayesian gene selection.使用贝叶斯基因选择的逻辑回归进行癌症分类和预测。
J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009.

引用本文的文献

1
GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions.图神经网络(GNNs)和集成模型增强了在未知条件下对新的小RNA-信使核糖核酸(sRNA-mRNA)相互作用的预测。
BMC Bioinformatics. 2025 May 21;26(1):131. doi: 10.1186/s12859-025-06153-w.
2
A new feature selection approach with binary exponential henry gas solubility optimization and hybrid data transformation methods.一种采用二元指数亨利气体溶解度优化和混合数据变换方法的新特征选择方法。
MethodsX. 2024 May 20;12:102770. doi: 10.1016/j.mex.2024.102770. eCollection 2024 Jun.
3
Novel candidate genes for environmental stresses response in Synechocystis sp. PCC 6803 revealed by machine learning algorithms.
利用机器学习算法揭示集胞藻 PCC 6803 中环境胁迫反应的新候选基因。
Braz J Microbiol. 2024 Jun;55(2):1219-1229. doi: 10.1007/s42770-024-01338-6. Epub 2024 May 6.
4
The ability to classify patients based on gene-expression data varies by algorithm and performance metric.基于基因表达数据对患者进行分类的能力因算法和性能指标而异。
PLoS Comput Biol. 2022 Mar 11;18(3):e1009926. doi: 10.1371/journal.pcbi.1009926. eCollection 2022 Mar.
5
Data analysis methods for defining biomarkers from omics data.用于从组学数据中定义生物标志物的数据分析方法。
Anal Bioanal Chem. 2022 Jan;414(1):235-250. doi: 10.1007/s00216-021-03813-7. Epub 2021 Dec 24.
6
Enhanced Directed Random Walk for the Identification of Breast Cancer Prognostic Markers from Multiclass Expression Data.用于从多类表达数据中识别乳腺癌预后标志物的增强定向随机游走算法
Entropy (Basel). 2021 Sep 20;23(9):1232. doi: 10.3390/e23091232.
7
Gene expression feature selection for prostate cancer diagnosis using a two-phase heuristic-deterministic search strategy.基于两阶段启发式确定性搜索策略的前列腺癌诊断基因表达特征选择
IET Syst Biol. 2018 Aug;12(4):162-169. doi: 10.1049/iet-syb.2017.0044.
8
A consensus multi-view multi-objective gene selection approach for improved sample classification.一种共识多视角多目标基因选择方法,用于提高样本分类。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):386. doi: 10.1186/s12859-020-03681-5.
9
Abnormal Emotional Processing and Emotional Experience in Patients with Peripheral Facial Nerve Paralysis: An MEG Study.周围性面神经麻痹患者的异常情绪加工与情绪体验:一项脑磁图研究
Brain Sci. 2020 Mar 4;10(3):147. doi: 10.3390/brainsci10030147.
10
Automated Detection of Alzheimer's Disease Using Brain MRI Images- A Study with Various Feature Extraction Techniques.基于脑 MRI 图像的阿尔茨海默病自动检测——多种特征提取技术的研究。
J Med Syst. 2019 Aug 9;43(9):302. doi: 10.1007/s10916-019-1428-9.