Suppr超能文献

一种用于高维生物医学数据特征选择的高效二进制沙猫群优化算法

An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data.

作者信息

Pashaei Elnaz

机构信息

Department of Computer Engineering, Istanbul Aydin University, Istanbul 34295, Turkey.

Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

出版信息

Bioengineering (Basel). 2023 Sep 25;10(10):1123. doi: 10.3390/bioengineering10101123.

Abstract

Recent breakthroughs are making a significant contribution to big data in biomedicine which are anticipated to assist in disease diagnosis and patient care management. To obtain relevant information from this data, effective administration and analysis are required. One of the major challenges associated with biomedical data analysis is the so-called "curse of dimensionality". For this issue, a new version of Binary Sand Cat Swarm Optimization (called PILC-BSCSO), incorporating a pinhole-imaging-based learning strategy and crossover operator, is presented for selecting the most informative features. First, the crossover operator is used to strengthen the search capability of BSCSO. Second, the pinhole-imaging learning strategy is utilized to effectively increase exploration capacity while avoiding premature convergence. The Support Vector Machine (SVM) classifier with a linear kernel is used to assess classification accuracy. The experimental results show that the PILC-BSCSO algorithm beats 11 cutting-edge techniques in terms of classification accuracy and the number of selected features using three public medical datasets. Moreover, PILC-BSCSO achieves a classification accuracy of 100% for colon cancer, which is difficult to classify accurately, based on just 10 genes. A real Liver Hepatocellular Carcinoma (TCGA-HCC) data set was also used to further evaluate the effectiveness of the PILC-BSCSO approach. PILC-BSCSO identifies a subset of five marker genes, including prognostic biomarkers HMMR, CHST4, and COL15A1, that have excellent predictive potential for liver cancer using TCGA data.

摘要

近期的突破为生物医学大数据做出了重大贡献,预计将有助于疾病诊断和患者护理管理。为了从这些数据中获取相关信息,需要进行有效的管理和分析。与生物医学数据分析相关的主要挑战之一是所谓的“维度诅咒”。针对这个问题,提出了一种新版本的二进制沙猫群优化算法(称为PILC-BSCSO),它结合了基于针孔成像的学习策略和交叉算子,用于选择最具信息性的特征。首先,交叉算子用于增强BSCSO的搜索能力。其次,利用针孔成像学习策略有效地提高探索能力,同时避免过早收敛。使用具有线性核的支持向量机(SVM)分类器来评估分类准确率。实验结果表明,在使用三个公共医学数据集的情况下,PILC-BSCSO算法在分类准确率和所选特征数量方面击败了11种前沿技术。此外,PILC-BSCSO基于仅10个基因就实现了对难以准确分类的结肠癌100%的分类准确率。还使用了一个真实的肝细胞癌(TCGA-HCC)数据集来进一步评估PILC-BSCSO方法的有效性。PILC-BSCSO使用TCGA数据识别出一个由五个标记基因组成的子集,包括预后生物标志物HMMR、CHST4和COL15A1,它们对肝癌具有出色的预测潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec46/10604175/8c4ae3c69961/bioengineering-10-01123-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验