Suppr超能文献

基于基因表达数据的癌症生存预测的卷积神经网络迁移学习。

Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.

机构信息

Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain.

出版信息

PLoS One. 2020 Mar 26;15(3):e0230536. doi: 10.1371/journal.pone.0230536. eCollection 2020.

Abstract

Precision medicine in oncology aims at obtaining data from heterogeneous sources to have a precise estimation of a given patient's state and prognosis. With the purpose of advancing to personalized medicine framework, accurate diagnoses allow prescription of more effective treatments adapted to the specificities of each individual case. In the last years, next-generation sequencing has impelled cancer research by providing physicians with an overwhelming amount of gene-expression data from RNA-seq high-throughput platforms. In this scenario, data mining and machine learning techniques have widely contribute to gene-expression data analysis by supplying computational models to supporting decision-making on real-world data. Nevertheless, existing public gene-expression databases are characterized by the unfavorable imbalance between the huge number of genes (in the order of tenths of thousands) and the small number of samples (in the order of a few hundreds) available. Despite diverse feature selection and extraction strategies have been traditionally applied to surpass derived over-fitting issues, the efficacy of standard machine learning pipelines is far from being satisfactory for the prediction of relevant clinical outcomes like follow-up end-points or patient's survival. Using the public Pan-Cancer dataset, in this study we pre-train convolutional neural network architectures for survival prediction on a subset composed of thousands of gene-expression samples from thirty-one tumor types. The resulting architectures are subsequently fine-tuned to predict lung cancer progression-free interval. The application of convolutional networks to gene-expression data has many limitations, derived from the unstructured nature of these data. In this work we propose a methodology to rearrange RNA-seq data by transforming RNA-seq samples into gene-expression images, from which convolutional networks can extract high-level features. As an additional objective, we investigate whether leveraging the information extracted from other tumor-type samples contributes to the extraction of high-level features that improve lung cancer progression prediction, compared to other machine learning approaches.

摘要

肿瘤精准医学旨在从异质来源获取数据,以精确估计给定患者的状态和预后。为了推进个体化医学框架,准确的诊断可以允许根据每个病例的具体情况开处方更有效的治疗方法。在过去的几年中,下一代测序通过为医生提供来自 RNA-seq 高通量平台的大量基因表达数据,推动了癌症研究。在这种情况下,数据挖掘和机器学习技术通过为支持真实数据决策提供计算模型,广泛地为基因表达数据分析做出了贡献。然而,现有的公共基因表达数据库的特点是基因数量(成千上万)和可用样本数量(几百)之间极不平衡。尽管传统上已经应用了多种特征选择和提取策略来克服由此产生的过拟合问题,但标准机器学习管道的效果远不能令人满意,无法预测相关的临床结果,如随访终点或患者的生存。在这项研究中,我们使用公共的泛癌数据集,在由三十一种肿瘤类型的数千个基因表达样本组成的子集中对卷积神经网络架构进行生存预测的预训练。随后,对这些架构进行微调,以预测肺癌无进展间隔。卷积网络在基因表达数据上的应用有许多限制,这些限制源于这些数据的非结构化性质。在这项工作中,我们提出了一种将 RNA-seq 数据重新排列的方法,通过将 RNA-seq 样本转换为基因表达图像,卷积网络可以从这些图像中提取高级特征。作为附加目标,我们研究了利用来自其他肿瘤类型样本的信息是否有助于提取高级特征,与其他机器学习方法相比,这些特征可以提高肺癌进展预测的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea2/7098575/1243dd105625/pone.0230536.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验