ClearF++：在类内嵌入和重构中使用特征聚类改进监督特征评分

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.

作者信息

Wang Sehee, Kim So Yeon, Sohn Kyung-Ah

机构信息

Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea.

Department of Software and Computer Engineering, Ajou University, Suwon 16499, Republic of Korea.

出版信息

Bioengineering (Basel). 2023 Jul 10;10(7):824. doi: 10.3390/bioengineering10070824.

DOI:10.3390/bioengineering10070824

PMID:37508851

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10376817/

Abstract

Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issues by using reconstruction error from low-dimensional embeddings as a proxy for the entropy term in the mutual information. However, ClearF still has limitations, including a nontransparent bottleneck layer selection process, which can result in unstable feature selection. To address these limitations, we propose ClearF++, which simplifies the bottleneck layer selection and incorporates feature-wise clustering to enhance biomarker detection. We compare its performance with other commonly used methods such as MultiSURF and IFS, as well as ClearF, across multiple benchmark datasets. Our results demonstrate that ClearF++ consistently outperforms these methods in terms of prediction accuracy and stability, even with limited samples. We also observe that employing the Deep Embedded Clustering (DEC) algorithm for feature-wise clustering improves performance, indicating its suitability for handling complex data structures with limited samples. ClearF++ offers an improved biomarker prioritization approach with enhanced prediction performance and faster execution. Its stability and effectiveness with limited samples make it particularly valuable for biomedical data analysis.

摘要

特征选择方法对于准确的疾病分类和识别信息性生物标志物至关重要。虽然信息论方法已被广泛使用，但它们常常表现出诸如计算成本高之类的局限性。我们之前提出的ClearF方法，通过使用低维嵌入的重构误差作为互信息中熵项的替代来解决这些问题。然而，ClearF仍然存在局限性，包括瓶颈层选择过程不透明，这可能导致特征选择不稳定。为了解决这些局限性，我们提出了ClearF++，它简化了瓶颈层选择，并纳入了特征级聚类以增强生物标志物检测。我们在多个基准数据集上，将其性能与其他常用方法（如MultiSURF和IFS）以及ClearF进行了比较。我们的结果表明，即使样本有限，ClearF++在预测准确性和稳定性方面始终优于这些方法。我们还观察到，采用深度嵌入聚类（DEC）算法进行特征级聚类可提高性能，这表明它适用于处理样本有限的复杂数据结构。ClearF++提供了一种改进的生物标志物优先级排序方法，具有更高的预测性能和更快的执行速度。它在样本有限时的稳定性和有效性使其在生物医学数据分析中特别有价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bfe/10376817/53b593b5d8bb/bioengineering-10-00824-g001.jpg

相似文献

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.

Bioengineering (Basel). 2023 Jul 10;10(7):824. doi: 10.3390/bioengineering10070824.

ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction.

BMC Med Genomics. 2019 Jul 11;12(Suppl 5):95. doi: 10.1186/s12920-019-0512-9.

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.

BMC Genomics. 2020 Sep 22;21(1):650. doi: 10.1186/s12864-020-07038-3.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A neurodynamic optimization approach to supervised feature selection via fractional programming.

Neural Netw. 2021 Apr;136:194-206. doi: 10.1016/j.neunet.2021.01.004. Epub 2021 Jan 14.

A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy.

Entropy (Basel). 2023 Mar 16;25(3):510. doi: 10.3390/e25030510.

Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data.

BMC Bioinformatics. 2012 Feb 8;13:26. doi: 10.1186/1471-2105-13-26.

Semi Supervised Learning with Deep Embedded Clustering for Image Classification and Segmentation.

IEEE Access. 2019;7:11093-11104. doi: 10.1109/ACCESS.2019.2891970. Epub 2019 Jan 9.

The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification.

Comput Methods Programs Biomed. 2008 Jun;90(3):275-84. doi: 10.1016/j.cmpb.2008.01.003. Epub 2008 Mar 4.

An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.

PeerJ Comput Sci. 2021 Sep 16;7:e671. doi: 10.7717/peerj-cs.671. eCollection 2021.

本文引用的文献

CREB3L1 promotes tumor growth and metastasis of anaplastic thyroid carcinoma by remodeling the tumor microenvironment.

Mol Cancer. 2022 Oct 3;21(1):190. doi: 10.1186/s12943-022-01658-x.

Time and phenotype-dependent transcriptome analysis in AAV-TGFβ1 and Bleomycin-induced lung fibrosis models.

Sci Rep. 2022 Jul 16;12(1):12190. doi: 10.1038/s41598-022-16344-7.

Systematic identification of ACE2 expression modulators reveals cardiomyopathy as a risk factor for mortality in COVID-19 patients.

Genome Biol. 2022 Jan 10;23(1):15. doi: 10.1186/s13059-021-02589-4.

An Aggregated Mutual Information Based Feature Selection with Machine Learning Methods for Enhancing IoT Botnet Attack Detection.

Sensors (Basel). 2021 Dec 28;22(1):185. doi: 10.3390/s22010185.

A variable selection method based on mutual information and variance inflation factor.

Spectrochim Acta A Mol Biomol Spectrosc. 2022 Mar 5;268:120652. doi: 10.1016/j.saa.2021.120652. Epub 2021 Nov 20.

Heparan Sulfate Biosynthesis and Sulfation Profiles as Modulators of Cancer Signalling and Progression.

Front Oncol. 2021 Nov 11;11:778752. doi: 10.3389/fonc.2021.778752. eCollection 2021.

PROLIDASE: A Review from Discovery to its Role in Health and Disease.

Front Mol Biosci. 2021 Aug 31;8:723003. doi: 10.3389/fmolb.2021.723003. eCollection 2021.

Collagen type 1 alpha 1 chain is a novel predictive biomarker of poor progression-free survival and chemoresistance in metastatic lung cancer.

J Cancer. 2021 Jul 25;12(19):5723-5731. doi: 10.7150/jca.59723. eCollection 2021.

Inactivation of EMILIN-1 by Proteolysis and Secretion in Small Extracellular Vesicles Favors Melanoma Progression and Metastasis.

Int J Mol Sci. 2021 Jul 9;22(14):7406. doi: 10.3390/ijms22147406.

Glypican-1 is a novel immunohistochemical marker to differentiate poorly differentiated squamous cell carcinoma from solid predominant adenocarcinoma of the lung.

Transl Lung Cancer Res. 2021 Feb;10(2):766-775. doi: 10.21037/tlcr-20-857.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ClearF++：在类内嵌入和重构中使用特征聚类改进监督特征评分

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献