利用基因表达和DNA甲基化数据对乳腺癌进行综合生存分析。

Integrative survival analysis of breast cancer with gene expression and DNA methylation data.

作者信息

Bichindaritz Isabelle, Liu Guanghui, Bartlett Christopher

机构信息

Intelligent Bio Systems Laboratory, Biomedical and Health Informatics, Department of Computer Science, State University of New York at Oswego, Syracuse, NY 13202, USA.

出版信息

Bioinformatics. 2021 Sep 9;37(17):2601-2608. doi: 10.1093/bioinformatics/btab140.

DOI:10.1093/bioinformatics/btab140

PMID:33681976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8428600/

Abstract

MOTIVATION

Integrative multi-feature fusion analysis on biomedical data has gained much attention recently. In breast cancer, existing studies have demonstrated that combining genomic mRNA data and DNA methylation data can better stratify cancer patients with distinct prognosis than using single signature. However, those existing methods are simply combining these gene features in series and have ignored the correlations between separate omics dimensions over time.

RESULTS

In the present study, we propose an adaptive multi-task learning method, which combines the Cox loss task with the ordinal loss task, for survival prediction of breast cancer patients using multi-modal learning instead of performing survival analysis on each feature dataset. First, we use local maximum quasi-clique merging (lmQCM) algorithm to reduce the mRNA and methylation feature dimensions and extract cluster eigengenes respectively. Then, we add an auxiliary ordinal loss to the original Cox model to improve the ability to optimize the learning process in training and regularization. The auxiliary loss helps to reduce the vanishing gradient problem for earlier layers and helps to decrease the loss of the primary task. Meanwhile, we use an adaptive weights approach to multi-task learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. Finally, we build an ordinal cox hazards model for survival analysis and use long short-term memory (LSTM) method to predict patients' survival risk. We use the cross-validation method and the concordance index (C-index) for assessing the prediction effect. Stringent cross-verification testing processes for the benchmark dataset and two additional datasets demonstrate that the developed approach is effective, achieving very competitive performance with existing approaches.

AVAILABILITY AND IMPLEMENTATION

https://github.com/bhioswego/ML_ordCOX.

摘要

动机

生物医学数据的整合多特征融合分析近来备受关注。在乳腺癌研究中，现有研究表明，相较于使用单一特征，结合基因组mRNA数据和DNA甲基化数据能够更好地对预后不同的癌症患者进行分层。然而，这些现有方法只是简单地将这些基因特征串联起来，并且忽略了不同组学维度随时间的相关性。

结果

在本研究中，我们提出了一种自适应多任务学习方法，该方法将Cox损失任务与有序损失任务相结合，用于通过多模态学习预测乳腺癌患者的生存情况，而非对每个特征数据集进行生存分析。首先，我们使用局部最大准团合并（lmQCM）算法来降低mRNA和甲基化特征维度，并分别提取聚类特征基因。然后，我们在原始Cox模型中添加辅助有序损失，以提高在训练和正则化过程中优化学习过程的能力。辅助损失有助于减少早期层的梯度消失问题，并有助于降低主要任务的损失。同时，我们使用自适应权重方法进行多任务学习，该方法通过考虑每个任务的同方差不确定性来权衡多个损失函数。最后，我们构建一个有序Cox风险模型用于生存分析，并使用长短期记忆（LSTM）方法预测患者的生存风险。我们使用交叉验证方法和一致性指数（C-index）来评估预测效果。对基准数据集和另外两个数据集进行的严格交叉验证测试过程表明，所开发的方法是有效的，与现有方法相比具有极具竞争力的性能。

可用性与实现方式

https://github.com/bhioswego/ML_ordCOX 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f786/8428600/298bebbb2925/btab140f1.jpg

相似文献

Integrative survival analysis of breast cancer with gene expression and DNA methylation data.

Bioinformatics. 2021 Sep 9;37(17):2601-2608. doi: 10.1093/bioinformatics/btab140.

Pancancer survival prediction using a deep learning architecture with multimodal representation and integration.

Bioinform Adv. 2023 Jan 23;3(1):vbad006. doi: 10.1093/bioadv/vbad006. eCollection 2023.

Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer.

BMC Med Genomics. 2018 Sep 14;11(Suppl 3):68. doi: 10.1186/s12920-018-0389-z.

Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.

Methods. 2017 Jul 15;124:100-107. doi: 10.1016/j.ymeth.2017.06.010. Epub 2017 Jun 13.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

HCNM: Heterogeneous Correlation Network Model for Multi-level Integrative Study of Multi-omics Data for Cancer Subtype Prediction.

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1880-1886. doi: 10.1109/EMBC46164.2021.9630781.

A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction.

Artif Intell Med. 2022 Apr;126:102260. doi: 10.1016/j.artmed.2022.102260. Epub 2022 Feb 24.

Multi-task multi-modal learning for joint diagnosis and prognosis of human cancers.

Med Image Anal. 2020 Oct;65:101795. doi: 10.1016/j.media.2020.101795. Epub 2020 Jul 23.

Integrative cancer patient stratification via subspace merging.

Bioinformatics. 2019 May 15;35(10):1653-1659. doi: 10.1093/bioinformatics/bty866.

Integrative analysis of cross-modal features for the prognosis prediction of clear cell renal cell carcinoma.

Bioinformatics. 2020 May 1;36(9):2888-2895. doi: 10.1093/bioinformatics/btaa056.

引用本文的文献

Multimodal integration strategies for clinical application in oncology.

Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

PCLSurv: a prototypical contrastive learning-based multi-omics data integration model for cancer survival prediction.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf124.

Open challenges and opportunities in federated foundation models towards biomedical healthcare.

BioData Min. 2025 Jan 4;18(1):2. doi: 10.1186/s13040-024-00414-9.

Integrative Analysis of ATAC-Seq and RNA-Seq through Machine Learning Identifies 10 Signature Genes for Breast Cancer Intrinsic Subtypes.

Biology (Basel). 2024 Oct 7;13(10):799. doi: 10.3390/biology13100799.

Smart Biosensor for Breast Cancer Survival Prediction Based on Multi-View Multi-Way Graph Learning.

Sensors (Basel). 2024 May 21;24(11):3289. doi: 10.3390/s24113289.

Application of deep learning in cancer epigenetics through DNA methylation analysis.

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad411.

Multimodal deep learning approaches for single-cell multi-omics data integration.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad313.

Cancer survival prediction by learning comprehensive deep feature representation for multiple types of genetic data.

BMC Bioinformatics. 2023 Jun 28;24(1):267. doi: 10.1186/s12859-023-05392-z.

Pancancer survival prediction using a deep learning architecture with multimodal representation and integration.

Bioinform Adv. 2023 Jan 23;3(1):vbad006. doi: 10.1093/bioadv/vbad006. eCollection 2023.

A five-pseudouridylation-associated-LncRNA classifier for primary prostate cancer prognosis prediction.

Front Genet. 2023 Jan 10;13:1110799. doi: 10.3389/fgene.2022.1110799. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用基因表达和DNA甲基化数据对乳腺癌进行综合生存分析。

Integrative survival analysis of breast cancer with gene expression and DNA methylation data.

作者信息

Bichindaritz Isabelle, Liu Guanghui, Bartlett Christopher

机构信息

Intelligent Bio Systems Laboratory, Biomedical and Health Informatics, Department of Computer Science, State University of New York at Oswego, Syracuse, NY 13202, USA.