Suppr超能文献

一种混合特征选择算法及其在生物信息学中的应用。

A hybrid feature selection algorithm and its application in bioinformatics.

作者信息

Wang Yangyang, Gao Xiaoguang, Ru Xinxin, Sun Pengzhan, Wang Jihan

机构信息

School of Electronics and Information, Northwestern Polytechnical University, Xi'an, Shaanxi, China.

Institute of Medical Research, Northwestern Polytechnical University, Xi'an, Shaanxi, China.

出版信息

PeerJ Comput Sci. 2022 Mar 22;8:e933. doi: 10.7717/peerj-cs.933. eCollection 2022.

Abstract

Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.

摘要

特征选择是一种针对高维数据集的独立技术,已在多个领域广泛应用。随着信息的大量扩展,如生物信息学数据,近几十年来迫切需要研究更有效、准确的涉及特征选择的方法。在此,我们提出了混合MMPSO方法,通过结合特征排序方法和启发式搜索方法,以获得可用于更高分类准确率的最优子集。在本研究中,分析了从UCI机器学习库获得的十个数据集,以证明我们方法的优越性。MMPSO算法在使用相同数量特征的情况下,在分类准确率方面优于其他算法。然后我们将该方法应用于一个生物数据集,该数据集包含从癌症基因组图谱(TCGA)和基因型-组织表达(GTEx)获得的关于肝肝细胞癌(LIHC)样本的基因表达信息。基于MMPSO算法,我们确定了一个在区分正常样本和肿瘤方面表现良好的18基因特征。18个差异表达基因中有9个在LIHC肿瘤样本中显著上调,并且组合的7个基因(ADRA2B、ERAP2、NPC1L1、PLVAP、POMC、PYROXD2、TRIM29)在将肿瘤与正常样本分类时的曲线下面积(AUC)大于0.99。6个基因(ADRA2B、PYROXD2、CACHD1、FKBP1B、PRKD1和RPL7AP6)与生存时间显著相关。MMPSO算法可用于从高维数据集中有效提取特征,这将为从生物数据中识别生物标志物或治疗靶点提供新线索,并为肿瘤研究提供更多视角。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/577f/9044222/de106d98125a/peerj-cs-08-933-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验