基于群组方法数据处理技术的特征排序与选择以改进医学数据分类

GMDH-based feature ranking and selection for improved classification of medical data.

作者信息

Abdel-Aal R E

机构信息

Physics Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia.

出版信息

J Biomed Inform. 2005 Dec;38(6):456-68. doi: 10.1016/j.jbi.2005.03.003. Epub 2005 Apr 16.

DOI:10.1016/j.jbi.2005.03.003

PMID:16337569

Abstract

Medical applications are often characterized by a large number of disease markers and a relatively small number of data records. We demonstrate that complete feature ranking followed by selection can lead to appreciable reductions in data dimensionality, with significant improvements in the implementation and performance of classifiers for medical diagnosis. We describe a novel approach for ranking all features according to their predictive quality using properties unique to learning algorithms based on the group method of data handling (GMDH). An abductive network training algorithm is repeatedly used to select groups of optimum predictors from the feature set at gradually increasing levels of model complexity specified by the user. Groups selected earlier are better predictors. The process is then repeated to rank features within individual groups. The resulting full feature ranking can be used to determine the optimum feature subset by starting at the top of the list and progressively including more features until the classification error rate on an out-of-sample evaluation set starts to increase due to overfitting. The approach is demonstrated on two medical diagnosis datasets (breast cancer and heart disease) and comparisons are made with other feature ranking and selection methods. Receiver operating characteristics (ROC) analysis is used to compare classifier performance. At default model complexity, dimensionality reduction of 22 and 54% could be achieved for the breast cancer and heart disease data, respectively, leading to improvements in the overall classification performance. For both datasets, considerable dimensionality reduction introduced no significant reduction in the area under the ROC curve. GMDH-based feature selection results have also proved effective with neural network classifiers.

摘要

医学应用通常具有大量疾病标志物和相对较少的数据记录。我们证明，先进行完整的特征排序然后再进行选择，可以显著降低数据维度，同时在医学诊断分类器的实现和性能方面有显著提升。我们描述了一种新颖的方法，即根据基于数据处理分组方法（GMDH）的学习算法所特有的属性，依据所有特征的预测质量对其进行排序。一种溯因网络训练算法被反复用于从用户指定的逐渐增加的模型复杂度水平下的特征集中选择最优预测变量组。较早选择的组是更好的预测变量。然后重复该过程对各个组内的特征进行排序。通过从列表顶部开始并逐步纳入更多特征，直到由于过拟合导致样本外评估集上的分类错误率开始增加，由此得到的完整特征排序可用于确定最优特征子集。该方法在两个医学诊断数据集（乳腺癌和心脏病）上进行了演示，并与其他特征排序和选择方法进行了比较。使用受试者工作特征（ROC）分析来比较分类器性能。在默认模型复杂度下，乳腺癌和心脏病数据分别可实现22%和54%的降维，从而提高了整体分类性能。对于这两个数据集，大幅降维并未导致ROC曲线下面积显著减小。基于GMDH的特征选择结果在神经网络分类器中也已证明是有效的。

相似文献

GMDH-based feature ranking and selection for improved classification of medical data.基于群组方法数据处理技术的特征排序与选择以改进医学数据分类

J Biomed Inform. 2005 Dec;38(6):456-68. doi: 10.1016/j.jbi.2005.03.003. Epub 2005 Apr 16.

Improved classification of medical data using abductive network committees trained on different feature subsets.使用在不同特征子集上训练的溯因网络委员会改进医学数据分类。

Comput Methods Programs Biomed. 2005 Nov;80(2):141-53. doi: 10.1016/j.cmpb.2005.08.001. Epub 2005 Sep 19.

A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve.一种基于受试者工作特征曲线下面积的简约且与阈值无关的蛋白质特征选择方法。

Bioinformatics. 2007 Oct 15;23(20):2788-94. doi: 10.1093/bioinformatics/btm442. Epub 2007 Sep 18.

An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。

J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.

Data mining techniques for cancer detection using serum proteomic profiling.利用血清蛋白质组分析进行癌症检测的数据挖掘技术

Artif Intell Med. 2004 Oct;32(2):71-83. doi: 10.1016/j.artmed.2004.03.006.

A combination of rough-based feature selection and RBF neural network for classification using gene expression data.一种基于粗糙集的特征选择与径向基函数神经网络相结合的方法，用于利用基因表达数据进行分类。

IEEE Trans Nanobioscience. 2008 Mar;7(1):91-9. doi: 10.1109/TNB.2008.2000142.

Feature selection and classification in supporting report-based self-management for people with chronic pain.基于报告的慢性疼痛患者自我管理中的特征选择与分类

IEEE Trans Inf Technol Biomed. 2011 Jan;15(1):54-61. doi: 10.1109/TITB.2010.2091510. Epub 2010 Nov 11.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods.运用可视化和特征选择方法，将不规则和不平衡数据应用于预测糖尿病肾病。

Artif Intell Med. 2008 Jan;42(1):37-53. doi: 10.1016/j.artmed.2007.09.005. Epub 2007 Nov 7.

引用本文的文献

Enhancing Influenza Detection through Integrative Machine Learning and Nasopharyngeal Metabolomic Profiling: A Comprehensive Study.通过整合机器学习和鼻咽代谢组学分析提高流感检测：一项综合研究

Diagnostics (Basel). 2024 Oct 4;14(19):2214. doi: 10.3390/diagnostics14192214.

Systems biology and machine learning approaches identify drug targets in diabetic nephropathy.系统生物学和机器学习方法鉴定糖尿病肾病的药物靶点。

Sci Rep. 2021 Dec 6;11(1):23452. doi: 10.1038/s41598-021-02282-3.

Deep learning for early detection of pathological changes in X-ray bone microstructures: case of osteoarthritis.深度学习在 X 射线骨微观结构病变早期检测中的应用：以骨关节炎为例。

Sci Rep. 2021 Jan 27;11(1):2294. doi: 10.1038/s41598-021-81786-4.

Time series prediction of under-five mortality rates for Nigeria: comparative analysis of artificial neural networks, Holt-Winters exponential smoothing and autoregressive integrated moving average models.尼日利亚五岁以下儿童死亡率的时间序列预测：人工神经网络、Holt-Winters 指数平滑和自回归综合移动平均模型的比较分析。

BMC Med Res Methodol. 2020 Dec 3;20(1):292. doi: 10.1186/s12874-020-01159-9.

Multiparametric quantitative and texture F-FDG PET/CT analysis for primary malignant tumour grade differentiation.多参数定量及纹理分析的F-FDG PET/CT用于原发性恶性肿瘤分级鉴别

Eur Radiol Exp. 2019 Dec 18;3(1):48. doi: 10.1186/s41747-019-0124-3.

Enhancement of early cervical cancer diagnosis with epithelial layer analysis of fluorescence lifetime images.通过荧光寿命图像的上皮层分析提高早期宫颈癌诊断水平

PLoS One. 2015 May 12;10(5):e0125706. doi: 10.1371/journal.pone.0125706. eCollection 2015.

Classification and Progression Based on CFS-GA and C5.0 Boost Decision Tree of TCM Zheng in Chronic Hepatitis B.基于 CFS-GA 和 C5.0 提升决策树的慢性乙型肝炎中医证候分类及演进

Evid Based Complement Alternat Med. 2013;2013:695937. doi: 10.1155/2013/695937. Epub 2013 Jan 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于群组方法数据处理技术的特征排序与选择以改进医学数据分类

GMDH-based feature ranking and selection for improved classification of medical data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献