Suppr超能文献

使用深度学习方法为癌症研究选择精确的参考正常组织样本。

Selecting precise reference normal tissue samples for cancer research using a deep learning approach.

机构信息

Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.

Shandong University, Qingdao, Shandong, China.

出版信息

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):21. doi: 10.1186/s12920-018-0463-6.

Abstract

BACKGROUND

Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide matched tissue samples for every cancer or cancer subtype. The recent GTEx project has profiled samples from healthy individuals, providing an excellent resource for this field, yet the feasibility of using GTEx samples as the reference remains unanswered.

METHODS

We analyze RNA-Seq data processed from the same computational pipeline and systematically evaluate GTEx as a potential reference resource. We use those cancers that have adjacent normal tissues in TCGA as a benchmark for the evaluation. To correlate tumor samples and normal samples, we explore top varying genes, reduced features from principal component analysis, and encoded features from an autoencoder neural network. We first evaluate whether these methods can identify the correct tissue of origin from GTEx for a given cancer and then seek to answer whether disease expression signatures are consistent between those derived from TCGA and from GTEx.

RESULTS

Among 32 TCGA cancers, 18 cancers have less than 10 matched adjacent normal tissue samples. Among three methods, autoencoder performed the best in predicting tissue of origin, with 12 of 14 cancers correctly predicted. The reason for misclassification of two cancers is that none of normal samples from GTEx correlate well with any tumor samples in these cancers. This suggests that GTEx has matched tissues for the majority cancers, but not all. While using autoencoder to select proper normal samples for disease signature creation, we found that disease signatures derived from normal samples selected via an autoencoder from GTEx are consistent with those derived from adjacent samples from TCGA in many cases. Interestingly, choosing top 50 mostly correlated samples regardless of tissue type performed reasonably well or even better in some cancers.

CONCLUSIONS

Our findings demonstrate that samples from GTEx can serve as reference normal samples for cancers, especially those do not have available adjacent tissue samples. A deep-learning based approach holds promise to select proper normal samples.

摘要

背景

通常使用正常组织样本作为理解疾病机制的对照,但在许多情况下,从患者中收集匹配的正常组织样本是困难的。例如,在癌症研究中,TCGA 和 TARGET 等开放癌症资源并未为每种癌症或癌症亚型提供匹配的组织样本。最近的 GTEx 项目对健康个体的样本进行了分析,为该领域提供了极好的资源,但使用 GTEx 样本作为参考的可行性仍未得到解答。

方法

我们分析了来自同一计算管道处理的 RNA-Seq 数据,并系统地评估了 GTEx 作为潜在参考资源的可行性。我们将 TCGA 中具有相邻正常组织的癌症作为评估基准。为了将肿瘤样本与正常样本相关联,我们探索了顶级变异基因、主成分分析的降维特征以及自动编码器神经网络的编码特征。我们首先评估这些方法是否可以从 GTEx 中识别出给定癌症的正确组织来源,然后探讨是否可以从 TCGA 和 GTEx 中得出一致的疾病表达特征。

结果

在 32 种 TCGA 癌症中,有 18 种癌症的匹配相邻正常组织样本少于 10 个。在这三种方法中,自动编码器在预测组织来源方面表现最好,其中 14 种癌症中有 12 种得到正确预测。两种癌症分类错误的原因是 GTEx 中没有任何正常样本与这些癌症中的任何肿瘤样本相关性好。这表明 GTEx 为大多数癌症提供了匹配的组织,但并非所有癌症都有。在使用自动编码器选择合适的正常样本进行疾病特征创建时,我们发现从 GTEx 自动编码器选择的正常样本中提取的疾病特征与 TCGA 相邻样本中提取的特征在许多情况下是一致的。有趣的是,在某些癌症中,选择前 50 个相关性最高的样本,而不考虑组织类型,效果相当好,甚至更好。

结论

我们的研究结果表明,GTEx 样本可作为癌症的参考正常样本,尤其是那些无法获得相邻组织样本的癌症。基于深度学习的方法有望选择合适的正常样本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7931/6357350/ee3d0f7f81b0/12920_2018_463_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验