文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

GVES:一种用于识别具有小数据集的预后基因的机器学习模型。

GVES: machine learning model for identification of prognostic genes with a small dataset.

机构信息

Department of Computer Science and Engineering, Incheon National University, Incheon, Republic of Korea.

Department of Computer Science, Yonsei University, Seoul, Republic of Korea.

出版信息

Sci Rep. 2021 Jan 11;11(1):439. doi: 10.1038/s41598-020-79889-5.


DOI:10.1038/s41598-020-79889-5
PMID:33431999
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7801384/
Abstract

Machine learning may be a powerful approach to more accurate identification of genes that may serve as prognosticators of cancer outcomes using various types of omics data. However, to date, machine learning approaches have shown limited prediction accuracy for cancer outcomes, primarily owing to small sample numbers and relatively large number of features. In this paper, we provide a description of GVES (Gene Vector for Each Sample), a proposed machine learning model that can be efficiently leveraged even with a small sample size, to increase the accuracy of identification of genes with prognostic value. GVES, an adaptation of the continuous bag of words (CBOW) model, generates vector representations of all genes for all samples by leveraging gene expression and biological network data. GVES clusters samples using their gene vectors, and identifies genes that divide samples into good and poor outcome groups for the prediction of cancer outcomes. Because GVES generates gene vectors for each sample, the sample size effect is reduced. We applied GVES to six cancer types and demonstrated that GVES outperformed existing machine learning methods, particularly for cancer datasets with a small number of samples. Moreover, the genes identified as prognosticators were shown to reside within a number of significant prognostic genetic pathways associated with pancreatic cancer.

摘要

机器学习可能是一种强大的方法,可以更准确地识别可能作为癌症结果预测因子的基因,使用各种类型的组学数据。然而,迄今为止,机器学习方法对癌症结果的预测准确性有限,主要是由于样本数量小,特征数量相对较大。在本文中,我们提供了 GVES(每个样本的基因向量)的描述,这是一种拟议的机器学习模型,即使在样本数量较少的情况下,也可以有效地利用它来提高具有预后价值的基因的识别准确性。GVES 是连续袋字(CBOW)模型的一种改编,通过利用基因表达和生物网络数据为所有样本生成所有基因的向量表示。GVES 使用基因向量对样本进行聚类,并识别出将样本分为良好和不良预后组的基因,以预测癌症结果。由于 GVES 为每个样本生成基因向量,因此可以减少样本大小的影响。我们将 GVES 应用于六种癌症类型,并证明 GVES 优于现有的机器学习方法,特别是对于样本数量较少的癌症数据集。此外,被确定为预后标志物的基因被证明位于与胰腺癌相关的一些重要预后遗传途径内。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/8f8546d27b68/41598_2020_79889_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/8a04a0e33fa8/41598_2020_79889_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/6cbe047206bb/41598_2020_79889_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/3cf46af881c2/41598_2020_79889_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/34ac6b8ce6dd/41598_2020_79889_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/0bc70a34a11b/41598_2020_79889_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/d49e1d18fb30/41598_2020_79889_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/8f8546d27b68/41598_2020_79889_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/8a04a0e33fa8/41598_2020_79889_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/6cbe047206bb/41598_2020_79889_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/3cf46af881c2/41598_2020_79889_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/34ac6b8ce6dd/41598_2020_79889_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/0bc70a34a11b/41598_2020_79889_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/d49e1d18fb30/41598_2020_79889_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/539c/7801384/8f8546d27b68/41598_2020_79889_Fig7_HTML.jpg

相似文献

[1]
GVES: machine learning model for identification of prognostic genes with a small dataset.

Sci Rep. 2021-1-11

[2]
G2Vec: Distributed gene representations for identification of cancer prognostic genes.

Sci Rep. 2018-9-13

[3]
C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.

Comput Methods Programs Biomed. 2019-6-29

[4]
Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN.

PLoS One. 2021-4-27

[5]
Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network.

Med Hypotheses. 2020-4

[6]
Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas.

PLoS Comput Biol. 2019-2-20

[7]
ES-MDA: Enhanced Similarity-based MiRNA-Disease Association.

Curr Protein Pept Sci. 2020

[8]
Optimisation of cancer classification by machine learning generates an enriched list of candidate drug targets and biomarkers.

Mol Omics. 2020-4-1

[9]
Identification of hub genes with diagnostic values in pancreatic cancer by bioinformatics analyses and supervised learning methods.

World J Surg Oncol. 2018-11-14

[10]
LUADpp: an effective prediction model on prognosis of lung adenocarcinomas based on somatic mutational features.

BMC Cancer. 2019-3-22

引用本文的文献

[1]
Advances in the field of developing biomarkers for re-irradiation: a how-to guide to small, powerful data sets and artificial intelligence.

Expert Rev Precis Med Drug Dev. 2024

[2]
Unraveling the copper-death connection: Decoding COVID-19's immune landscape through advanced bioinformatics and machine learning approaches.

Hum Vaccin Immunother. 2024-12-31

[3]
The Utility of Artificial Intelligence in the Diagnosis and Management of Pancreatic Cancer.

Cureus. 2023-11-28

[4]
Accurate Prediction of Cancer Prognosis by Exploiting Patient-Specific Cancer Driver Genes.

Int J Mol Sci. 2023-3-29

[5]
Artificial intelligence in pancreatic cancer.

Theranostics. 2022

[6]
Transcriptional and post-transcriptional regulation of checkpoint genes on the tumour side of the immunological synapse.

Heredity (Edinb). 2022-7

[7]
Machine learning on small size samples: A synthetic knowledge synthesis.

Sci Prog. 2022

[8]
Machine learning for manually-measured water quality prediction in fish farming.

PLoS One. 2021

本文引用的文献

[1]
Bioinformatics Methods to Select Prognostic Biomarker Genes from Large Scale Datasets: A Review.

Biotechnol J. 2018-12

[2]
An Improved Method for Prediction of Cancer Prognosis by Network Learning.

Genes (Basel). 2018-10-2

[3]
G2Vec: Distributed gene representations for identification of cancer prognostic genes.

Sci Rep. 2018-9-13

[4]
A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data.

Bioinformatics. 2018-11-1

[5]
TCGA-assembler 2: software pipeline for retrieval and processing of TCGA/CPTAC data.

Bioinformatics. 2018-5-1

[6]
KRAS: The Critical Driver and Therapeutic Target for Pancreatic Cancer.

Cold Spring Harb Perspect Med. 2018-9-4

[7]
Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers.

Bioinformatics. 2017-11-15

[8]
KEGG: new perspectives on genomes, pathways, diseases and drugs.

Nucleic Acids Res. 2017-1-4

[9]
Impacts of activation of the mitogen-activated protein kinase pathway in pancreatic cancer.

Front Oncol. 2015-2-4

[10]
The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge.

Contemp Oncol (Pozn). 2015

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索