Suppr超能文献

基于多尺度监督聚类的特征选择在肿瘤分类和基因组数据的生物标志物和靶标鉴定中的应用。

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.

机构信息

School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.

School of Control Science and Engineering, Shandong University, Jinan, 250061, China.

出版信息

BMC Genomics. 2020 Sep 22;21(1):650. doi: 10.1186/s12864-020-07038-3.

Abstract

BACKGROUND

The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field.

RESULTS

In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy.

CONCLUSIONS

The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.

摘要

背景

深度学习技术在疾病分类中的应用受到样本数量少和维度诅咒的限制。此外,基于聚类的特征选择算法的性能仍然远未令人满意,因为它们在使用无监督学习方法方面存在局限性。为了提高可解释性并克服这个问题,我们开发了一种新的特征选择算法。同时,复杂的基因组数据给生物标志物和治疗靶点的识别带来了巨大挑战。目前,一些特征选择方法在这一领域存在灵敏度和特异性低的问题。

结果

在本文中,我们设计了一种名为 MCBFS 的基于多尺度聚类的特征选择算法,该算法同时对基因组数据分析执行特征选择和模型学习。通过在八个数据集上与七种基准和六种最先进的监督方法进行比较,实验结果表明 MCBFS 具有稳健性和有效性。可视化结果和统计检验表明,MCBFS 可以捕获信息基因,提高肿瘤基因表达和单细胞测序数据的可解释性和可视化。此外,我们使用基因表达数据和蛋白质相互作用数据开发了一个名为 McbfsNW 的通用框架,以识别疾病诊断和治疗的稳健生物标志物和治疗靶点。该框架结合了 MCBFS 算法、网络识别集成算法和特征选择包装器。McbfsNW 已应用于肺腺癌 (LUAD) 数据集。初步结果表明,通过在独立的 LUAD 数据集上识别生物标志物,可以获得更高的预测结果,我们还构建了一个药物-靶标网络,这可能对 LUAD 治疗有帮助。

结论

所提出的新特征选择方法对于基因选择、分类和可视化是稳健和有效的。框架 McbfsNW 对于在基因组数据中识别生物标志物和靶点是实用和有帮助的。相信相同的方法和原则具有可扩展性和适用性,可以应用于其他不同类型的数据集。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验