Suppr超能文献

用于癌症数据整合的变分自编码器:设计原理与计算实践

Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice.

作者信息

Simidjievski Nikola, Bodnar Cristian, Tariq Ifrah, Scherer Paul, Andres Terre Helena, Shams Zohreh, Jamnik Mateja, Liò Pietro

机构信息

Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom.

Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.

出版信息

Front Genet. 2019 Dec 11;10:1205. doi: 10.3389/fgene.2019.01205. eCollection 2019.

Abstract

International initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyze such data, several machine learning, bioinformatics, and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyze multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data.In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built, and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.

摘要

诸如国际乳腺癌分子分类联盟等国际倡议正在收集不同基因组规模的多个数据集,目的是识别新的癌症生物标志物并预测患者的生存率。为了分析此类数据,已经应用了多种机器学习、生物信息学和统计方法,其中包括自编码器等神经网络。尽管这些模型提供了一个很好的统计学习框架来分析多组学和/或临床数据,但在如何整合不同的患者数据以及确定最适合现有数据的最佳设计方面,明显缺乏相关研究。在本文中,我们研究了几种整合多种癌症患者数据类型(例如多组学和临床数据)的自编码器架构。我们对这些方法进行了广泛的分析,并为设计系统提供了一个清晰的方法和计算框架,使临床医生能够研究癌症特征并将结果转化为临床应用。我们展示了如何设计、构建这些网络,特别是如何将其应用于异质性乳腺癌数据的综合分析任务。结果表明,这些方法产生了相关的数据表示形式,进而实现了准确而稳定的诊断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2680/6917668/ef2149104e60/fgene-10-01205-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验