基于基因表达数据的数据增强和疾病预测的联合三元组损失与半硬约束。

Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data.

机构信息

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.

Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.

出版信息

Sci Rep. 2023 Oct 24;13(1):18178. doi: 10.1038/s41598-023-45467-8.

DOI:10.1038/s41598-023-45467-8

PMID:37875602

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10598120/

Abstract

The accurate prediction of patients with complex diseases, such as Alzheimer's disease (AD), as well as disease stages, including early- and late-stage cancer, is challenging owing to substantial variability among patients and limited availability of clinical data. Deep metric learning has emerged as a promising approach for addressing these challenges by improving data representation. In this study, we propose a joint triplet loss model with a semi-hard constraint (JTSC) to represent data in a small number of samples. JTSC strictly selects semi-hard samples by switching anchors and positive samples during the learning process in triplet embedding and combines a triplet loss function with an angular loss function. Our results indicate that JTSC significantly improves the number of appropriately represented samples during training when applied to the gene expression data of AD and to cancer stage prediction tasks. Furthermore, we demonstrate that using an embedding vector from JTSC as an input to the classifiers for AD and cancer stage prediction significantly improves classification performance by extracting more accurate features. In conclusion, we show that feature embedding through JTSC can aid in classification when there are a small number of samples compared to a larger number of features.

摘要

由于患者之间存在很大的变异性，并且临床数据有限，因此准确预测复杂疾病（如阿尔茨海默病（AD））以及疾病阶段（包括早期和晚期癌症）具有挑战性。深度度量学习通过改进数据表示，已成为应对这些挑战的一种很有前途的方法。在这项研究中，我们提出了一种联合三重损失模型和半硬约束（JTSC），以在少量样本中表示数据。JTSC 通过在三重嵌入过程中切换锚点和正样本，严格选择半硬样本，并将三重损失函数与角损失函数相结合。我们的结果表明，当应用于 AD 的基因表达数据和癌症阶段预测任务时，JTSC 可以显著提高训练过程中适当表示样本的数量。此外，我们证明，使用 JTSC 的嵌入向量作为 AD 和癌症阶段预测的分类器输入，可以通过提取更准确的特征来显著提高分类性能。总之，我们表明，与更多特征相比，当样本数量较少时，通过 JTSC 进行特征嵌入可以辅助分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/136d/10598120/cbe24b32c7d3/41598_2023_45467_Fig1_HTML.jpg

相似文献

Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data.基于基因表达数据的数据增强和疾病预测的联合三元组损失与半硬约束。

Sci Rep. 2023 Oct 24;13(1):18178. doi: 10.1038/s41598-023-45467-8.

Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease.基于图嵌入的阿尔茨海默病文献发现链路预测。

J Biomed Inform. 2023 Sep;145:104464. doi: 10.1016/j.jbi.2023.104464. Epub 2023 Aug 2.

DeAF: A multimodal deep learning framework for disease prediction.DeAF：一种用于疾病预测的多模态深度学习框架。

Comput Biol Med. 2023 Apr;156:106715. doi: 10.1016/j.compbiomed.2023.106715. Epub 2023 Feb 28.

A conditional Triplet loss for few-shot learning and its application to image co-segmentation.条件三元组损失的少样本学习及其在图像共分割中的应用。

Neural Netw. 2021 May;137:54-62. doi: 10.1016/j.neunet.2021.01.002. Epub 2021 Jan 20.

A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease.一种参数高效的深度学习方法，用于预测轻度认知障碍向阿尔茨海默病的转化。

Neuroimage. 2019 Apr 1;189:276-287. doi: 10.1016/j.neuroimage.2019.01.031. Epub 2019 Jan 14.

A Deep Learning Approach for Automated Diagnosis and Multi-Class Classification of Alzheimer's Disease Stages Using Resting-State fMRI and Residual Neural Networks.基于静息态 fMRI 和残差神经网络的深度学习方法对阿尔茨海默病阶段进行自动诊断和多分类。

J Med Syst. 2019 Dec 18;44(2):37. doi: 10.1007/s10916-019-1475-2.

Deep Learning for Alzheimer's Disease Classification using Texture Features.使用纹理特征的深度学习用于阿尔茨海默病分类

Curr Med Imaging Rev. 2019;15(7):689-698. doi: 10.2174/1573405615666190404163233.

Person Reidentification via Structural Deep Metric Learning.基于结构深度度量学习的行人再识别。

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2987-2998. doi: 10.1109/TNNLS.2018.2861991.

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。

Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.

c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer's disease.c-Diadem：一种受限双输入深度学习模型，用于识别阿尔茨海默病中的新型生物标志物。

BMC Med Genomics. 2023 Oct 13;16(Suppl 2):244. doi: 10.1186/s12920-023-01675-9.

引用本文的文献

Multi-task machine learning for transfusion decision support in acute upper gastrointestinal bleeding: a novel ensemble approach with clinical validation.用于急性上消化道出血输血决策支持的多任务机器学习：一种经过临床验证的新型集成方法

J Transl Med. 2025 Sep 2;23(1):979. doi: 10.1186/s12967-025-06995-1.

A review of AI-based radiogenomics in neurodegenerative disease.基于人工智能的神经退行性疾病放射基因组学综述

Front Big Data. 2025 Feb 20;8:1515341. doi: 10.3389/fdata.2025.1515341. eCollection 2025.

本文引用的文献

A Novel TCGA-Validated, MiRNA-Based Signature for Prediction of Breast Cancer Prognosis and Survival.一种经TCGA验证的、基于miRNA的新型乳腺癌预后和生存预测标志物。

Front Cell Dev Biol. 2021 Sep 13;9:717462. doi: 10.3389/fcell.2021.717462. eCollection 2021.

Multi-omics data integration by generative adversarial network.基于生成对抗网络的多组学数据整合。

Bioinformatics. 2021 Dec 22;38(1):179-186. doi: 10.1093/bioinformatics/btab608.

Text Data Augmentation for Deep Learning.用于深度学习的文本数据增强

J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.

Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN.通过生成对抗网络（GAN）进行样本增强来提高致病分期的预测准确性。

PLoS One. 2021 Apr 27;16(4):e0250458. doi: 10.1371/journal.pone.0250458. eCollection 2021.

Adversarial generation of gene expression data.对抗生成基因表达数据。

Bioinformatics. 2022 Jan 12;38(3):730-737. doi: 10.1093/bioinformatics/btab035.

Blood neuro-exosomal synaptic proteins predict Alzheimer's disease at the asymptomatic stage.血液神经外泌体突触蛋白可预测无症状期的阿尔茨海默病。

Alzheimers Dement. 2021 Jan;17(1):49-60. doi: 10.1002/alz.12166. Epub 2020 Aug 10.

Gene biomarker discovery at different stages of Alzheimer using gene co-expression network approach.利用基因共表达网络方法在阿尔茨海默病不同阶段进行基因生物标志物发现。

Sci Rep. 2020 Jul 22;10(1):12210. doi: 10.1038/s41598-020-69249-8.

Deep learning-based survival prediction for multiple cancer types using histopathology images.基于深度学习的多癌症类型生存预测：使用组织病理学图像。

PLoS One. 2020 Jun 17;15(6):e0233678. doi: 10.1371/journal.pone.0233678. eCollection 2020.

Convolutional neural network models for cancer type prediction based on gene expression.基于基因表达的癌症类型预测卷积神经网络模型。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.

Deep learning for stage prediction in neuroblastoma using gene expression data.利用基因表达数据进行神经母细胞瘤分期预测的深度学习

Genomics Inform. 2019 Sep;17(3):e30. doi: 10.5808/GI.2019.17.3.e30. Epub 2019 Sep 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于基因表达数据的数据增强和疾病预测的联合三元组损失与半硬约束。

Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献