基于多尺度监督聚类的特征选择在肿瘤分类和基因组数据的生物标志物和靶标鉴定中的应用。

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.

机构信息

School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.

School of Control Science and Engineering, Shandong University, Jinan, 250061, China.

出版信息

BMC Genomics. 2020 Sep 22;21(1):650. doi: 10.1186/s12864-020-07038-3.

DOI:10.1186/s12864-020-07038-3

PMID:32962626

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7510277/

Abstract

BACKGROUND

The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field.

RESULTS

In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy.

CONCLUSIONS

The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.

摘要

背景

深度学习技术在疾病分类中的应用受到样本数量少和维度诅咒的限制。此外，基于聚类的特征选择算法的性能仍然远未令人满意，因为它们在使用无监督学习方法方面存在局限性。为了提高可解释性并克服这个问题，我们开发了一种新的特征选择算法。同时，复杂的基因组数据给生物标志物和治疗靶点的识别带来了巨大挑战。目前，一些特征选择方法在这一领域存在灵敏度和特异性低的问题。

结果

在本文中，我们设计了一种名为 MCBFS 的基于多尺度聚类的特征选择算法，该算法同时对基因组数据分析执行特征选择和模型学习。通过在八个数据集上与七种基准和六种最先进的监督方法进行比较，实验结果表明 MCBFS 具有稳健性和有效性。可视化结果和统计检验表明，MCBFS 可以捕获信息基因，提高肿瘤基因表达和单细胞测序数据的可解释性和可视化。此外，我们使用基因表达数据和蛋白质相互作用数据开发了一个名为 McbfsNW 的通用框架，以识别疾病诊断和治疗的稳健生物标志物和治疗靶点。该框架结合了 MCBFS 算法、网络识别集成算法和特征选择包装器。McbfsNW 已应用于肺腺癌 (LUAD) 数据集。初步结果表明，通过在独立的 LUAD 数据集上识别生物标志物，可以获得更高的预测结果，我们还构建了一个药物-靶标网络，这可能对 LUAD 治疗有帮助。

结论

所提出的新特征选择方法对于基因选择、分类和可视化是稳健和有效的。框架 McbfsNW 对于在基因组数据中识别生物标志物和靶点是实用和有帮助的。相信相同的方法和原则具有可扩展性和适用性，可以应用于其他不同类型的数据集。

相似文献

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.基于多尺度监督聚类的特征选择在肿瘤分类和基因组数据的生物标志物和靶标鉴定中的应用。

BMC Genomics. 2020 Sep 22;21(1):650. doi: 10.1186/s12864-020-07038-3.

Subclassification of lung adenocarcinoma through comprehensive multi-omics data to benefit survival outcomes.通过综合多组学数据对肺腺癌进行亚分类以改善生存结果。

Comput Biol Chem. 2024 Oct;112:108150. doi: 10.1016/j.compbiolchem.2024.108150. Epub 2024 Jul 14.

A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma.基于特征选择的癌症诊断生物标志物识别框架：以肺腺癌为例。

PLoS One. 2022 Sep 6;17(9):e0269126. doi: 10.1371/journal.pone.0269126. eCollection 2022.

Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods.肺腺癌和肺鳞状细胞癌的癌症分类、生物标志物鉴定以及使用重叠特征选择方法的基因表达分析。

Sci Rep. 2021 Jun 25;11(1):13323. doi: 10.1038/s41598-021-92725-8.

EMT network-based feature selection improves prognosis prediction in lung adenocarcinoma.基于 EMT 网络的特征选择可改善肺腺癌的预后预测。

PLoS One. 2019 Jan 31;14(1):e0204186. doi: 10.1371/journal.pone.0204186. eCollection 2019.

A large cohort study identifying a novel prognosis prediction model for lung adenocarcinoma through machine learning strategies.一项通过机器学习策略确定肺腺癌新预后预测模型的大型队列研究。

BMC Cancer. 2019 Sep 5;19(1):886. doi: 10.1186/s12885-019-6101-7.

Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study.不同层次基因组数据对肺腺癌分期的协同作用：一项说明性研究。

Genes (Basel). 2021 Nov 24;12(12):1872. doi: 10.3390/genes12121872.

Feature selection for genomic data sets through feature clustering.

Int J Data Min Bioinform. 2010;4(2):228-40. doi: 10.1504/ijdmb.2010.032152.

Gene Expression Classification of Lung Adenocarcinoma into Molecular Subtypes.肺腺癌的基因表达分类成分子亚型。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1187-1197. doi: 10.1109/TCBB.2019.2905553. Epub 2019 Mar 18.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

引用本文的文献

Emotional Intervention and Education System Construction for Rural Children Based on Semantic Analysis.基于语义分析的农村儿童情绪干预与教育体系构建。

Occup Ther Int. 2022 Jul 4;2022:1073717. doi: 10.1155/2022/1073717. eCollection 2022.

Novel Collaborative Weighted Non-negative Matrix Factorization Improves Prediction of Disease-Associated Human Microbes.新型协作加权非负矩阵分解改进了疾病相关人类微生物的预测。

Front Microbiol. 2022 Mar 10;13:834982. doi: 10.3389/fmicb.2022.834982. eCollection 2022.

MDAKRLS: Predicting human microbe-disease association based on Kronecker regularized least squares and similarities.MDAKRLS：基于克罗内克正则化最小二乘法和相似度预测人类微生物-疾病关联

J Transl Med. 2021 Feb 12;19(1):66. doi: 10.1186/s12967-021-02732-6.

Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: machine learning versus multinomial models.使用临床、转录组学和临床-转录组学数据对肺腺癌患者的多类别结局进行分类：机器学习与多项式模型

Am J Cancer Res. 2020 Dec 1;10(12):4624-4639. eCollection 2020.

本文引用的文献

Protein-Protein Interactions Prediction Based on Graph Energy and Protein Sequence Information.基于图能量和蛋白质序列信息的蛋白质-蛋白质相互作用预测。

Molecules. 2020 Apr 16;25(8):1841. doi: 10.3390/molecules25081841.

Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data.基于多视图生物数据的共特征基因选择和肿瘤分类的有监督判别稀疏 PCA

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2926-2937. doi: 10.1109/TNNLS.2019.2893190. Epub 2019 Feb 22.

A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization.一种通过全局冗余最小化实现自动加权特征选择的通用框架。

IEEE Trans Image Process. 2018 Dec 14. doi: 10.1109/TIP.2018.2886761.

Feature selection of gene expression data for Cancer classification using double RBF-kernels.基于双 RBF 核的癌症分类基因表达数据特征选择。

BMC Bioinformatics. 2018 Oct 29;19(1):396. doi: 10.1186/s12859-018-2400-2.

Weighted General Group Lasso for Gene Selection in Cancer Classification.加权广义群组套索在癌症分类中的基因选择。

IEEE Trans Cybern. 2019 Aug;49(8):2860-2873. doi: 10.1109/TCYB.2018.2829811. Epub 2018 May 10.

The anti-tumor effect of regorafenib in lung squamous cell carcinoma in vitro.体外研究regorafenib 对肺鳞癌的抗肿瘤作用。

Biochem Biophys Res Commun. 2018 Sep 5;503(2):1123-1129. doi: 10.1016/j.bbrc.2018.06.129. Epub 2018 Jun 27.

A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data.基于基因表达数据的疾病预后分类和特征选择的图嵌入深度前馈网络。

Bioinformatics. 2018 Nov 1;34(21):3727-3737. doi: 10.1093/bioinformatics/bty429.

Antihypertensive drug-candesartan attenuates TRAIL resistance in human lung cancer via AMPK-mediated inhibition of autophagy flux.抗高血压药物坎地沙坦通过 AMPK 介导的抑制自噬通量来减轻人肺癌中 TRAIL 的耐药性。

Exp Cell Res. 2018 Jul 1;368(1):126-135. doi: 10.1016/j.yexcr.2018.04.022. Epub 2018 Apr 22.

Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization.使用混合二进制黑洞算法和改进二进制粒子群优化的基因选择。

Genomics. 2019 Jul;111(4):669-686. doi: 10.1016/j.ygeno.2018.04.004. Epub 2018 Apr 14.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱：L1000平台及首批100万个图谱

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于多尺度监督聚类的特征选择在肿瘤分类和基因组数据的生物标志物和靶标鉴定中的应用。

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献