Suppr超能文献

使用去噪自编码器从乳腺癌全基因组检测中进行无监督特征构建和知识提取。

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders.

作者信息

Tan Jie, Ung Matthew, Cheng Chao, Greene Casey S

机构信息

Department of Genetics, Institute for Quantitative Biomedical Sciences, Norris Cotton Cancer Center, The Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.

出版信息

Pac Symp Biocomput. 2015;20:132-43.

Abstract

Big data bring new opportunities for methods that efficiently summarize and automatically extract knowledge from such compendia. While both supervised learning algorithms and unsupervised clustering algorithms have been successfully applied to biological data, they are either dependent on known biology or limited to discerning the most significant signals in the data. Here we present denoising autoencoders (DAs), which employ a data-defined learning objective independent of known biology, as a method to identify and extract complex patterns from genomic data. We evaluate the performance of DAs by applying them to a large collection of breast cancer gene expression data. Results show that DAs successfully construct features that contain both clinical and molecular information. There are features that represent tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes. Features constructed by the autoencoder generalize to an independent dataset collected using a distinct experimental platform. By integrating data from ENCODE for feature interpretation, we discover a feature representing ER status through association with key transcription factors in breast cancer. We also identify a feature highly predictive of patient survival and it is enriched by FOXM1 signaling pathway. The features constructed by DAs are often bimodally distributed with one peak near zero and another near one, which facilitates discretization. In summary, we demonstrate that DAs effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.

摘要

大数据为从这些数据集中高效总结和自动提取知识的方法带来了新机遇。虽然监督学习算法和无监督聚类算法都已成功应用于生物数据,但它们要么依赖已知生物学知识,要么局限于辨别数据中最显著的信号。在此,我们提出去噪自编码器(DAs),它采用独立于已知生物学知识的数据定义学习目标,作为从基因组数据中识别和提取复杂模式的一种方法。我们通过将其应用于大量乳腺癌基因表达数据来评估DAs的性能。结果表明,DAs成功构建了包含临床和分子信息的特征。存在代表肿瘤或正常样本、雌激素受体(ER)状态以及分子亚型的特征。由自编码器构建的特征可推广到使用不同实验平台收集的独立数据集。通过整合来自ENCODE的数据进行特征解释,我们通过与乳腺癌中的关键转录因子关联发现了一个代表ER状态的特征。我们还识别出一个对患者生存具有高度预测性的特征,并且它在FOXM1信号通路中富集。由DAs构建的特征通常呈双峰分布,一个峰值接近零,另一个接近一,这便于离散化。总之,我们证明DAs能有效地从基因表达数据中提取关键生物学原理,并将其总结为具有便利特性的构建特征。

相似文献

引用本文的文献

1
Towards a new taxonomy of preterm birth.迈向早产的新分类法。
J Perinatol. 2024 Nov 20. doi: 10.1038/s41372-024-02183-z.

本文引用的文献

1
Big data bioinformatics.大数据生物信息学。
J Cell Physiol. 2014 Dec;229(12):1896-900. doi: 10.1002/jcp.24662.
7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验