Bichindaritz Isabelle, Liu Guanghui, Bartlett Christopher
Intelligent Bio Systems Laboratory, Biomedical and Health Informatics, Department of Computer Science, State University of New York at Oswego, Syracuse, NY 13202, USA.
Bioinformatics. 2021 Sep 9;37(17):2601-2608. doi: 10.1093/bioinformatics/btab140.
Integrative multi-feature fusion analysis on biomedical data has gained much attention recently. In breast cancer, existing studies have demonstrated that combining genomic mRNA data and DNA methylation data can better stratify cancer patients with distinct prognosis than using single signature. However, those existing methods are simply combining these gene features in series and have ignored the correlations between separate omics dimensions over time.
In the present study, we propose an adaptive multi-task learning method, which combines the Cox loss task with the ordinal loss task, for survival prediction of breast cancer patients using multi-modal learning instead of performing survival analysis on each feature dataset. First, we use local maximum quasi-clique merging (lmQCM) algorithm to reduce the mRNA and methylation feature dimensions and extract cluster eigengenes respectively. Then, we add an auxiliary ordinal loss to the original Cox model to improve the ability to optimize the learning process in training and regularization. The auxiliary loss helps to reduce the vanishing gradient problem for earlier layers and helps to decrease the loss of the primary task. Meanwhile, we use an adaptive weights approach to multi-task learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. Finally, we build an ordinal cox hazards model for survival analysis and use long short-term memory (LSTM) method to predict patients' survival risk. We use the cross-validation method and the concordance index (C-index) for assessing the prediction effect. Stringent cross-verification testing processes for the benchmark dataset and two additional datasets demonstrate that the developed approach is effective, achieving very competitive performance with existing approaches.
生物医学数据的整合多特征融合分析近来备受关注。在乳腺癌研究中,现有研究表明,相较于使用单一特征,结合基因组mRNA数据和DNA甲基化数据能够更好地对预后不同的癌症患者进行分层。然而,这些现有方法只是简单地将这些基因特征串联起来,并且忽略了不同组学维度随时间的相关性。
在本研究中,我们提出了一种自适应多任务学习方法,该方法将Cox损失任务与有序损失任务相结合,用于通过多模态学习预测乳腺癌患者的生存情况,而非对每个特征数据集进行生存分析。首先,我们使用局部最大准团合并(lmQCM)算法来降低mRNA和甲基化特征维度,并分别提取聚类特征基因。然后,我们在原始Cox模型中添加辅助有序损失,以提高在训练和正则化过程中优化学习过程的能力。辅助损失有助于减少早期层的梯度消失问题,并有助于降低主要任务的损失。同时,我们使用自适应权重方法进行多任务学习,该方法通过考虑每个任务的同方差不确定性来权衡多个损失函数。最后,我们构建一个有序Cox风险模型用于生存分析,并使用长短期记忆(LSTM)方法预测患者的生存风险。我们使用交叉验证方法和一致性指数(C-index)来评估预测效果。对基准数据集和另外两个数据集进行的严格交叉验证测试过程表明,所开发的方法是有效的,与现有方法相比具有极具竞争力的性能。