Suppr超能文献

基于低秩表示的自训练子空间聚类算法在基因表达数据癌症分类中的应用。

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1315-1324. doi: 10.1109/TCBB.2017.2712607. Epub 2017 Jun 6.

Abstract

Accurate identification of the cancer types is essential to cancer diagnoses and treatments. Since cancer tissue and normal tissue have different gene expression, gene expression data can be used as an efficient feature source for cancer classification. However, accurate cancer classification directly using original gene expression profiles remains challenging due to the intrinsic high-dimension feature and the small size of the data samples. We proposed a new self-training subspace clustering algorithm under low-rank representation, called SSC-LRR, for cancer classification on gene expression data. Low-rank representation (LRR) is first applied to extract discriminative features from the high-dimensional gene expression data; the self-training subspace clustering (SSC) method is then used to generate the cancer classification predictions. The SSC-LRR was tested on two separate benchmark datasets in control with four state-of-the-art classification methods. It generated cancer classification predictions with an overall accuracy 89.7 percent and a general correlation 0.920, which are 18.9 and 24.4 percent higher than that of the best control method respectively. In addition, several genes (RNF114, HLA-DRB5, USP9Y, and PTPN20) were identified by SSC-LRR as new cancer identifiers that deserve further clinical investigation. Overall, the study demonstrated a new sensitive avenue to recognize cancer classifications from large-scale gene expression data.

摘要

准确识别癌症类型对于癌症诊断和治疗至关重要。由于癌症组织和正常组织的基因表达不同,基因表达数据可以作为癌症分类的有效特征源。然而,由于数据样本的固有高维特征和小尺寸,直接使用原始基因表达谱进行准确的癌症分类仍然具有挑战性。我们提出了一种新的基于低秩表示的自训练子空间聚类算法(SSC-LRR),用于基因表达数据上的癌症分类。首先应用低秩表示(LRR)从高维基因表达数据中提取有区别的特征;然后使用自训练子空间聚类(SSC)方法生成癌症分类预测。在与四种最先进的分类方法的两个独立基准数据集上进行了 SSC-LRR 测试。它生成的癌症分类预测的总体准确性为 89.7%,总体相关性为 0.920,分别比最佳对照方法高 18.9%和 24.4%。此外,SSC-LRR 还鉴定了几个基因(RNF114、HLA-DRB5、USP9Y 和 PTPN20)作为新的癌症标识符,值得进一步临床研究。总的来说,这项研究为从大规模基因表达数据中识别癌症分类提供了一条新的敏感途径。

相似文献

6
Laplacian regularized low-rank representation for cancer samples clustering.拉普拉斯正则化低秩表示在癌症样本聚类中的应用。
Comput Biol Chem. 2019 Feb;78:504-509. doi: 10.1016/j.compbiolchem.2018.11.003. Epub 2018 Nov 19.

引用本文的文献

6
Silencing Inhibits the Proliferation and Metastasis of Gastric Cancer.沉默抑制胃癌的增殖和转移。
J Cancer. 2022 Jan 1;13(2):565-578. doi: 10.7150/jca.62033. eCollection 2022.

本文引用的文献

3
Cancer statistics, 2016.癌症统计数据,2016 年。
CA Cancer J Clin. 2016 Jan-Feb;66(1):7-30. doi: 10.3322/caac.21332. Epub 2016 Jan 7.
9
RPCA-Based Tumor Classification Using Gene Expression Data.基于鲁棒主成分分析的基因表达数据肿瘤分类
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):964-70. doi: 10.1109/TCBB.2014.2383375.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验