Suppr超能文献

整合来自不同来源的多模态数据以识别疾病亚型。

Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes.

作者信息

Zhou Kaiyue, Kottoori Bhagya Shree, Munj Seeya Awadhut, Zhang Zhewei, Draghici Sorin, Arslanturk Suzan

机构信息

Department of Computer Science, Wayne State University, Detroit, MI 48201, USA.

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

出版信息

Biology (Basel). 2022 Feb 24;11(3):360. doi: 10.3390/biology11030360.

Abstract

Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.

摘要

过去十年的研究产生了大量分子数据,这些数据可用于更好地理解癌症风险、进展和预后。然而,由于疾病的异质性,仅通过分析单一模式的数据无法了解进展风险,也无法区分长期和短期幸存者。使用一种经过科学开发和测试的深度学习方法,该方法利用从多个具有多种模式(如mRNA、DNA甲基化、miRNA)的数据库收集的汇总信息,可能会对疾病进展做出更准确、更可靠的预测。在此,我们提出一种基于自动编码器的多模态数据融合系统,其中融合编码器灵活地将多个研究中可用的集体信息与部分耦合的数据进行整合。我们在一项完全可控的基于模拟的研究中的结果表明,通过所提出的数据融合管道推断缺失数据,能够得到一个优于其他具有缺失模式的基线预测器的预测器。结果还进一步表明,多形性胶质母细胞瘤、急性髓系白血病和胰腺腺癌的短期和长期幸存者能够成功区分,其曲线下面积分别为0.94、0.75和0.96。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcca/8945377/6b078cebe6a2/biology-11-00360-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验