Suppr超能文献

基于机器学习的多组学和多尺度数据晚期融合用于非小细胞肺癌诊断

Machine-Learning-Based Late Fusion on Multi-Omics and Multi-Scale Data for Non-Small-Cell Lung Cancer Diagnosis.

作者信息

Carrillo-Perez Francisco, Morales Juan Carlos, Castillo-Secilla Daniel, Gevaert Olivier, Rojas Ignacio, Herrera Luis Javier

机构信息

Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain.

Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, 1265 Welch Rd, Stanford, CA 94305, USA.

出版信息

J Pers Med. 2022 Apr 8;12(4):601. doi: 10.3390/jpm12040601.

Abstract

Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.

摘要

区分各种非小细胞肺癌亚型对于为患者提供有效的治疗至关重要。为此,近年来机器学习技术已被应用于患者的可用生物学数据。然而,在大多数情况下,这个问题一直采用单模态方法处理,没有探索癌症数据的多尺度和多组学性质在分类方面的潜力。在这项工作中,我们通过使用后期融合策略和机器学习技术研究了五种多尺度和多组学模态(RNA测序、miRNA测序、全切片成像、拷贝数变异和DNA甲基化)的融合。我们为每种模态训练一个独立的机器学习模型,并通过使用一种新颖的优化方法来计算后期融合的参数,以递增的方式探索融合它们的输出所能获得的相互作用和收益。使用所有模态的最终分类模型获得了96.81±1.07的F1分数、0.993±0.004的AUC和0.980±0.016的AUPRC,优于每个独立模型所获得的结果以及文献中针对此问题所呈现的结果。这些结果表明,利用癌症数据的多尺度和多组学性质可以提高个性化医疗中单模态临床决策支持系统的性能,从而改善对患者的诊断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac05/9025878/3dc687d5bf89/jpm-12-00601-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验