Suppr超能文献

半监督学习提高了基于基因表达的癌症复发预测。

Semi-supervised learning improves gene expression-based prediction of cancer recurrence.

机构信息

Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.

出版信息

Bioinformatics. 2011 Nov 1;27(21):3017-23. doi: 10.1093/bioinformatics/btr502. Epub 2011 Sep 4.

Abstract

MOTIVATION

Gene expression profiling has shown great potential in outcome prediction for different types of cancers. Nevertheless, small sample size remains a bottleneck in obtaining robust and accurate classifiers. Traditional supervised learning techniques can only work with labeled data. Consequently, a large number of microarray data that do not have sufficient follow-up information are disregarded. To fully leverage all of the precious data in public databases, we turned to a semi-supervised learning technique, low density separation (LDS).

RESULTS

Using a clinically important question of predicting recurrence risk in colorectal cancer patients, we demonstrated that (i) semi-supervised classification improved prediction accuracy as compared with the state of the art supervised method SVM, (ii) performance gain increased with the number of unlabeled samples, (iii) unlabeled data from different institutes could be employed after appropriate processing and (iv) the LDS method is robust with regard to the number of input features. To test the general applicability of this semi-supervised method, we further applied LDS on human breast cancer datasets and also observed superior performance. Our results demonstrated great potential of semi-supervised learning in gene expression-based outcome prediction for cancer patients.

CONTACT

bing.zhang@vanderbilt.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因表达谱分析在不同类型癌症的预后预测方面显示出巨大的潜力。然而,小样本量仍然是获得稳健和准确分类器的瓶颈。传统的监督学习技术只能处理标记数据。因此,大量没有足够随访信息的微阵列数据被忽略了。为了充分利用公共数据库中的所有宝贵数据,我们转向了一种半监督学习技术,低密度分离(LDS)。

结果

我们使用一个临床重要的问题,即预测结直肠癌患者的复发风险,证明了(i)半监督分类与最先进的监督方法 SVM 相比提高了预测准确性,(ii)性能增益随着未标记样本数量的增加而增加,(iii)经过适当处理后,可以使用来自不同机构的未标记数据,以及(iv)LDS 方法对于输入特征的数量具有鲁棒性。为了测试这种半监督方法的通用性,我们进一步将 LDS 应用于人类乳腺癌数据集,也观察到了优越的性能。我们的结果表明,半监督学习在癌症患者基于基因表达的预后预测方面具有巨大的潜力。

联系方式

bing.zhang@vanderbilt.edu.

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

引用本文的文献

4
Semi-supervised learning in cancer diagnostics.癌症诊断中的半监督学习。
Front Oncol. 2022 Jul 14;12:960984. doi: 10.3389/fonc.2022.960984. eCollection 2022.
5
Incorporating Omics Data in Genomic Prediction.将组学数据纳入基因组预测
Methods Mol Biol. 2022;2467:341-357. doi: 10.1007/978-1-0716-2205-6_12.
7
Challenges in translational machine learning.转化机器学习中的挑战。
Hum Genet. 2022 Sep;141(9):1451-1466. doi: 10.1007/s00439-022-02439-8. Epub 2022 Mar 4.

本文引用的文献

3
Semi-supervised classification via local spline regression.基于局部样条回归的半监督分类。
IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):2039-53. doi: 10.1109/TPAMI.2010.35.
4
7
Discriminative semi-supervised feature selection via manifold regularization.基于流形正则化的判别式半监督特征选择
IEEE Trans Neural Netw. 2010 Jul;21(7):1033-47. doi: 10.1109/TNN.2010.2047114. Epub 2010 Jun 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验