Suppr超能文献

用于癌症亚型生物标志物筛选的动态元数据网络稀疏主成分分析

Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening.

作者信息

Miao Rui, Dong Xin, Liu Xiao-Ying, Lo Sio-Long, Mei Xin-Yue, Dang Qi, Cai Jie, Li Shao, Yang Kuo, Xie Sheng-Li, Liang Yong

机构信息

Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China.

Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China.

出版信息

Front Genet. 2022 May 9;13:869906. doi: 10.3389/fgene.2022.869906. eCollection 2022.

Abstract

Previous research shows that each type of cancer can be divided into multiple subtypes, which is one of the key reasons that make cancer difficult to cure. Under these circumstances, finding a new target gene of cancer subtypes has great significance on developing new anti-cancer drugs and personalized treatment. Due to the fact that gene expression data sets of cancer are usually high-dimensional and with high noise and have multiple potential subtypes' information, many sparse principal component analysis (sparse PCA) methods have been used to identify cancer subtype biomarkers and subtype clusters. However, the existing sparse PCA methods have not used the known cancer subtype information as prior knowledge, and their results are greatly affected by the quality of the samples. Therefore, we propose the Dynamic Metadata Edge-group Sparse PCA (DM-ESPCA) model, which combines the idea of meta-learning to solve the problem of sample quality and uses the known cancer subtype information as prior knowledge to capture some gene modules with better biological interpretations. The experiment results on the three biological data sets showed that the DM-ESPCA model can find potential target gene probes with richer biological information to the cancer subtypes. Moreover, the results of clustering and machine learning classification models based on the target genes screened by the DM-ESPCA model can be improved by up to 22-23% of accuracies compared with the existing sparse PCA methods. We also proved that the result of the DM-ESPCA model is better than those of the four classic supervised machine learning models in the task of classification of cancer subtypes.

摘要

先前的研究表明,每种癌症都可分为多个亚型,这是癌症难以治愈的关键原因之一。在这种情况下,找到癌症亚型的新靶基因对于开发新的抗癌药物和个性化治疗具有重要意义。由于癌症的基因表达数据集通常具有高维度、高噪声且包含多种潜在亚型的信息,许多稀疏主成分分析(sparse PCA)方法已被用于识别癌症亚型生物标志物和亚型聚类。然而,现有的稀疏PCA方法未将已知的癌症亚型信息作为先验知识使用,其结果受样本质量影响很大。因此,我们提出了动态元数据边缘组稀疏PCA(DM - ESPCA)模型,该模型结合元学习思想来解决样本质量问题,并将已知的癌症亚型信息作为先验知识,以捕获一些具有更好生物学解释的基因模块。在三个生物学数据集上的实验结果表明,DM - ESPCA模型能够为癌症亚型找到具有更丰富生物学信息的潜在靶基因探针。此外,与现有的稀疏PCA方法相比,基于DM - ESPCA模型筛选出的靶基因的聚类和机器学习分类模型的准确率可提高多达22 - 23%。我们还证明,在癌症亚型分类任务中,DM - ESPCA模型的结果优于四个经典的监督机器学习模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a8c/9197542/88c71bd8f561/fgene-13-869906-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验