基于低秩表示的自训练子空间聚类算法在基因表达数据癌症分类中的应用。

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1315-1324. doi: 10.1109/TCBB.2017.2712607. Epub 2017 Jun 6.

DOI:10.1109/TCBB.2017.2712607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5986621/

Abstract

Accurate identification of the cancer types is essential to cancer diagnoses and treatments. Since cancer tissue and normal tissue have different gene expression, gene expression data can be used as an efficient feature source for cancer classification. However, accurate cancer classification directly using original gene expression profiles remains challenging due to the intrinsic high-dimension feature and the small size of the data samples. We proposed a new self-training subspace clustering algorithm under low-rank representation, called SSC-LRR, for cancer classification on gene expression data. Low-rank representation (LRR) is first applied to extract discriminative features from the high-dimensional gene expression data; the self-training subspace clustering (SSC) method is then used to generate the cancer classification predictions. The SSC-LRR was tested on two separate benchmark datasets in control with four state-of-the-art classification methods. It generated cancer classification predictions with an overall accuracy 89.7 percent and a general correlation 0.920, which are 18.9 and 24.4 percent higher than that of the best control method respectively. In addition, several genes (RNF114, HLA-DRB5, USP9Y, and PTPN20) were identified by SSC-LRR as new cancer identifiers that deserve further clinical investigation. Overall, the study demonstrated a new sensitive avenue to recognize cancer classifications from large-scale gene expression data.

摘要

准确识别癌症类型对于癌症诊断和治疗至关重要。由于癌症组织和正常组织的基因表达不同，基因表达数据可以作为癌症分类的有效特征源。然而，由于数据样本的固有高维特征和小尺寸，直接使用原始基因表达谱进行准确的癌症分类仍然具有挑战性。我们提出了一种新的基于低秩表示的自训练子空间聚类算法（SSC-LRR），用于基因表达数据上的癌症分类。首先应用低秩表示（LRR）从高维基因表达数据中提取有区别的特征；然后使用自训练子空间聚类（SSC）方法生成癌症分类预测。在与四种最先进的分类方法的两个独立基准数据集上进行了 SSC-LRR 测试。它生成的癌症分类预测的总体准确性为 89.7%，总体相关性为 0.920，分别比最佳对照方法高 18.9%和 24.4%。此外，SSC-LRR 还鉴定了几个基因（RNF114、HLA-DRB5、USP9Y 和 PTPN20）作为新的癌症标识符，值得进一步临床研究。总的来说，这项研究为从大规模基因表达数据中识别癌症分类提供了一条新的敏感途径。

相似文献

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data.基于低秩表示的自训练子空间聚类算法在基因表达数据癌症分类中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1315-1324. doi: 10.1109/TCBB.2017.2712607. Epub 2017 Jun 6.

Accelerated low-rank representation for subspace clustering and semi-supervised classification on large-scale data.基于加速低秩表示的大规模数据子空间聚类与半监督分类。

Neural Netw. 2018 Apr;100:39-48. doi: 10.1016/j.neunet.2018.01.014. Epub 2018 Feb 2.

Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.使用低秩近似的多组学数据快速降维和整合聚类：在癌症分子分类中的应用

BMC Genomics. 2015 Dec 1;16:1022. doi: 10.1186/s12864-015-2223-8.

Low Rank Subspace Clustering via Discrete Constraint and Hypergraph Regularization for Tumor Molecular Pattern Discovery.基于离散约束和超图正则化的低秩子空间聚类在肿瘤分子模式发现中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1500-1512. doi: 10.1109/TCBB.2018.2834371. Epub 2018 May 11.

MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification.MGRFE：基于嵌入式遗传算法的多层递归特征消除在癌症分类中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):621-632. doi: 10.1109/TCBB.2019.2921961. Epub 2021 Apr 6.

Laplacian regularized low-rank representation for cancer samples clustering.拉普拉斯正则化低秩表示在癌症样本聚类中的应用。

Comput Biol Chem. 2019 Feb;78:504-509. doi: 10.1016/j.compbiolchem.2018.11.003. Epub 2018 Nov 19.

A Personalized Low-Rank Subspace Clustering Method Based on Locality and Similarity Constraints for scRNA-seq Data Analysis.基于局部性和相似性约束的个性化低秩子空间聚类方法在 scRNA-seq 数据分析中的应用。

IEEE J Biomed Health Inform. 2023 May;27(5):2575-2584. doi: 10.1109/JBHI.2023.3247723. Epub 2023 May 4.

Online Low-Rank Representation Learning for Joint Multi-Subspace Recovery and Clustering.在线低秩表示学习用于联合多子空间恢复和聚类。

IEEE Trans Image Process. 2018 Jan;27(1):335-348. doi: 10.1109/TIP.2017.2760510. Epub 2017 Oct 6.

Cancer molecular pattern discovery by subspace consensus kernel classification.基于子空间共识核分类的癌症分子模式发现

Comput Syst Bioinformatics Conf. 2007;6:55-65.

Multi-cancer samples clustering via graph regularized low-rank representation method under sparse and symmetric constraints.基于稀疏和对称约束的图正则化低秩表示方法的多癌样本聚类。

BMC Bioinformatics. 2019 Dec 30;20(Suppl 22):718. doi: 10.1186/s12859-019-3231-5.

引用本文的文献

Generalized Matrix Local Low Rank Representation by Random Projection and Submatrix Propagation.基于随机投影和子矩阵传播的广义矩阵局部低秩表示

KDD. 2023 Aug;2023:390-401. doi: 10.1145/3580305.3599361. Epub 2023 Aug 4.

Identification of PTPN20 as an innate immunity-related gene in gastric cancer with Helicobacter pylori infection.鉴定 PTPN20 为幽门螺杆菌感染相关性胃癌的固有免疫相关基因。

Front Immunol. 2023 Jun 9;14:1212692. doi: 10.3389/fimmu.2023.1212692. eCollection 2023.

A self-training subspace clustering algorithm based on adaptive confidence for gene expression data.一种基于自适应置信度的基因表达数据自训练子空间聚类算法。

Front Genet. 2023 Mar 21;14:1132370. doi: 10.3389/fgene.2023.1132370. eCollection 2023.

Comprehensive analysis of family expression and prognosis in acute myeloid leukemia.急性髓系白血病家族表达与预后的综合分析

Front Genet. 2023 Jan 9;13:1087938. doi: 10.3389/fgene.2022.1087938. eCollection 2022.

Online Education Classroom Intelligent Management System Based on Tensor CS Reconstruction Model.基于 Tensor CS 重建模型的在线教育课堂智能管理系统。

Comput Intell Neurosci. 2022 Jun 28;2022:9907786. doi: 10.1155/2022/9907786. eCollection 2022.

Silencing Inhibits the Proliferation and Metastasis of Gastric Cancer.沉默抑制胃癌的增殖和转移。

J Cancer. 2022 Jan 1;13(2):565-578. doi: 10.7150/jca.62033. eCollection 2022.

A Cascade Flexible Neural Forest Model for Cancer Subtypes Classification on Gene Expression Data.基于基因表达数据的癌症亚型分类的级联柔性神经森林模型。

Comput Intell Neurosci. 2021 Oct 5;2021:6480456. doi: 10.1155/2021/6480456. eCollection 2021.

Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning.通过深度学习和半监督学习开发可持续的疾病分类方法。

Healthcare (Basel). 2020 Aug 24;8(3):291. doi: 10.3390/healthcare8030291.

本文引用的文献

Downregulation of long non-coding RNA HCG11 predicts a poor prognosis in prostate cancer.长链非编码 RNA HCG11 的下调预示着前列腺癌预后不良。

Biomed Pharmacother. 2016 Oct;83:936-941. doi: 10.1016/j.biopha.2016.08.013. Epub 2016 Aug 11.

MYC regulates the antitumor immune response through CD47 and PD-L1.MYC 通过 CD47 和程序性死亡受体配体 1（PD-L1）调节抗肿瘤免疫反应。

Science. 2016 Apr 8;352(6282):227-31. doi: 10.1126/science.aac9935. Epub 2016 Mar 10.

Cancer statistics, 2016.癌症统计数据，2016 年。

CA Cancer J Clin. 2016 Jan-Feb;66(1):7-30. doi: 10.3322/caac.21332. Epub 2016 Jan 7.

miRNA-149* promotes cell proliferation and suppresses apoptosis by mediating JunB in T-cell acute lymphoblastic leukemia.微小RNA-149*通过介导JunB促进T细胞急性淋巴细胞白血病中的细胞增殖并抑制细胞凋亡。

Leuk Res. 2016 Feb;41:62-70. doi: 10.1016/j.leukres.2015.11.016. Epub 2015 Dec 1.

A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.一种新颖的关系正则化特征选择方法，用于 AD 诊断中的联合回归和分类。

Med Image Anal. 2017 May;38:205-214. doi: 10.1016/j.media.2015.10.008. Epub 2015 Nov 10.

Progesterone promotes cell migration, invasion and cofilin activation in human astrocytoma cells.孕酮促进人星形细胞瘤细胞的迁移、侵袭及丝切蛋白激活。

Steroids. 2016 Jan;105:19-25. doi: 10.1016/j.steroids.2015.11.008. Epub 2015 Nov 27.

Long non-coding RNA expression profiles of hepatitis C virus-related dysplasia and hepatocellular carcinoma.丙型肝炎病毒相关发育异常和肝细胞癌的长链非编码RNA表达谱

Oncotarget. 2015 Dec 22;6(41):43770-8. doi: 10.18632/oncotarget.6087.

Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.用于癌症分类的半监督投影非负矩阵分解

PLoS One. 2015 Sep 22;10(9):e0138814. doi: 10.1371/journal.pone.0138814. eCollection 2015.

RPCA-Based Tumor Classification Using Gene Expression Data.基于鲁棒主成分分析的基因表达数据肿瘤分类

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):964-70. doi: 10.1109/TCBB.2014.2383375.

Clinical utility of a novel urine-based gene fusion TTTY15-USP9Y in predicting prostate biopsy outcome.一种新型尿液基因融合TTTY15-USP9Y在预测前列腺活检结果中的临床应用。

Urol Oncol. 2015 Sep;33(9):384.e9-20. doi: 10.1016/j.urolonc.2015.01.019. Epub 2015 May 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。