使用深度学习方法为癌症研究选择精确的参考正常组织样本。

Selecting precise reference normal tissue samples for cancer research using a deep learning approach.

机构信息

Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.

Shandong University, Qingdao, Shandong, China.

出版信息

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):21. doi: 10.1186/s12920-018-0463-6.

DOI:10.1186/s12920-018-0463-6

PMID:30704474

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6357350/

Abstract

BACKGROUND

Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide matched tissue samples for every cancer or cancer subtype. The recent GTEx project has profiled samples from healthy individuals, providing an excellent resource for this field, yet the feasibility of using GTEx samples as the reference remains unanswered.

METHODS

We analyze RNA-Seq data processed from the same computational pipeline and systematically evaluate GTEx as a potential reference resource. We use those cancers that have adjacent normal tissues in TCGA as a benchmark for the evaluation. To correlate tumor samples and normal samples, we explore top varying genes, reduced features from principal component analysis, and encoded features from an autoencoder neural network. We first evaluate whether these methods can identify the correct tissue of origin from GTEx for a given cancer and then seek to answer whether disease expression signatures are consistent between those derived from TCGA and from GTEx.

RESULTS

Among 32 TCGA cancers, 18 cancers have less than 10 matched adjacent normal tissue samples. Among three methods, autoencoder performed the best in predicting tissue of origin, with 12 of 14 cancers correctly predicted. The reason for misclassification of two cancers is that none of normal samples from GTEx correlate well with any tumor samples in these cancers. This suggests that GTEx has matched tissues for the majority cancers, but not all. While using autoencoder to select proper normal samples for disease signature creation, we found that disease signatures derived from normal samples selected via an autoencoder from GTEx are consistent with those derived from adjacent samples from TCGA in many cases. Interestingly, choosing top 50 mostly correlated samples regardless of tissue type performed reasonably well or even better in some cancers.

CONCLUSIONS

Our findings demonstrate that samples from GTEx can serve as reference normal samples for cancers, especially those do not have available adjacent tissue samples. A deep-learning based approach holds promise to select proper normal samples.

摘要

背景

通常使用正常组织样本作为理解疾病机制的对照，但在许多情况下，从患者中收集匹配的正常组织样本是困难的。例如，在癌症研究中，TCGA 和 TARGET 等开放癌症资源并未为每种癌症或癌症亚型提供匹配的组织样本。最近的 GTEx 项目对健康个体的样本进行了分析，为该领域提供了极好的资源，但使用 GTEx 样本作为参考的可行性仍未得到解答。

方法

我们分析了来自同一计算管道处理的 RNA-Seq 数据，并系统地评估了 GTEx 作为潜在参考资源的可行性。我们将 TCGA 中具有相邻正常组织的癌症作为评估基准。为了将肿瘤样本与正常样本相关联，我们探索了顶级变异基因、主成分分析的降维特征以及自动编码器神经网络的编码特征。我们首先评估这些方法是否可以从 GTEx 中识别出给定癌症的正确组织来源，然后探讨是否可以从 TCGA 和 GTEx 中得出一致的疾病表达特征。

结果

在 32 种 TCGA 癌症中，有 18 种癌症的匹配相邻正常组织样本少于 10 个。在这三种方法中，自动编码器在预测组织来源方面表现最好，其中 14 种癌症中有 12 种得到正确预测。两种癌症分类错误的原因是 GTEx 中没有任何正常样本与这些癌症中的任何肿瘤样本相关性好。这表明 GTEx 为大多数癌症提供了匹配的组织，但并非所有癌症都有。在使用自动编码器选择合适的正常样本进行疾病特征创建时，我们发现从 GTEx 自动编码器选择的正常样本中提取的疾病特征与 TCGA 相邻样本中提取的特征在许多情况下是一致的。有趣的是，在某些癌症中，选择前 50 个相关性最高的样本，而不考虑组织类型，效果相当好，甚至更好。

结论

我们的研究结果表明，GTEx 样本可作为癌症的参考正常样本，尤其是那些无法获得相邻组织样本的癌症。基于深度学习的方法有望选择合适的正常样本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7931/6357350/ee3d0f7f81b0/12920_2018_463_Fig1_HTML.jpg

相似文献

Selecting precise reference normal tissue samples for cancer research using a deep learning approach.使用深度学习方法为癌症研究选择精确的参考正常组织样本。

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):21. doi: 10.1186/s12920-018-0463-6.

Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers.多轮混凝土自动编码器识别 12 种癌症的预后 lncRNAs。

Int J Mol Sci. 2021 Nov 3;22(21):11919. doi: 10.3390/ijms222111919.

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data.验证基于 RNA-seq 数据训练的深度学习组织分类器的可解释性。

Sci Rep. 2021 Jan 29;11(1):2641. doi: 10.1038/s41598-021-81773-9.

Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations.基于深度学习的 RNA-seq 数据癌症生存预后：方法与评估。

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):41. doi: 10.1186/s12920-020-0686-1.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.

Tissue, age, sex, and disease patterns of matrisome expression in GTEx transcriptome data.GTEx 转录组数据中基质表达的组织、年龄、性别和疾病模式。

Sci Rep. 2021 Nov 3;11(1):21549. doi: 10.1038/s41598-021-00943-x.

Analyzing cancer gene expression data through the lens of normal tissue-specificity.从正常组织特异性的角度分析癌症基因表达数据。

PLoS Comput Biol. 2021 Jun 18;17(6):e1009085. doi: 10.1371/journal.pcbi.1009085. eCollection 2021 Jun.

CancerNet: a unified deep learning network for pan-cancer diagnostics.CancerNet：一种用于泛癌诊断的统一深度学习网络。

BMC Bioinformatics. 2022 Jun 13;23(1):229. doi: 10.1186/s12859-022-04783-y.

Unifying cancer and normal RNA sequencing data from different sources.整合来自不同来源的癌症和正常 RNA 测序数据。

Sci Data. 2018 Apr 17;5:180061. doi: 10.1038/sdata.2018.61.

Circular RNAs and their associations with breast cancer subtypes.环状RNA及其与乳腺癌亚型的关联。

Oncotarget. 2016 Dec 6;7(49):80967-80979. doi: 10.18632/oncotarget.13134.

引用本文的文献

Consistently processed RNA sequencing data from 50 sources enriched for pediatric data.对来自50个富含儿科数据来源的RNA测序数据进行了一致处理。

Sci Data. 2025 Jul 2;12(1):1134. doi: 10.1038/s41597-025-05376-z.

Integrative multi-omics study identifies sex-specific molecular signatures and immune modulation in bladder cancer.整合多组学研究确定了膀胱癌中性别特异性分子特征和免疫调节。

Front Bioinform. 2025 May 19;5:1575790. doi: 10.3389/fbinf.2025.1575790. eCollection 2025.

Clinical Applications of Artificial Intelligence (AI) in Human Cancer: Is It Time to Update the Diagnostic and Predictive Models in Managing Hepatocellular Carcinoma (HCC)?人工智能（AI）在人类癌症中的临床应用：是时候更新肝细胞癌（HCC）管理中的诊断和预测模型了吗？

Diagnostics (Basel). 2025 Jan 22;15(3):252. doi: 10.3390/diagnostics15030252.

Computational discovery of co-expressed antigens as dual targeting candidates for cancer therapy through bulk, single-cell, and spatial transcriptomics.通过批量、单细胞和空间转录组学计算发现共表达抗原作为癌症治疗的双重靶向候选物。

Bioinform Adv. 2024 Jun 20;4(1):vbae096. doi: 10.1093/bioadv/vbae096. eCollection 2024.

The RNA binding proteins LARP4A and LARP4B promote sarcoma and carcinoma growth and metastasis.RNA结合蛋白LARP4A和LARP4B促进肉瘤和癌的生长及转移。

iScience. 2024 Feb 24;27(4):109288. doi: 10.1016/j.isci.2024.109288. eCollection 2024 Apr 19.

N-of-one differential gene expression without control samples using a deep generative model.使用深度生成模型进行无对照样本的 N-of-one 差异基因表达分析。

Genome Biol. 2023 Nov 16;24(1):263. doi: 10.1186/s13059-023-03104-7.

Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches.为实现体细胞结构变异的功能解读：基于细胞团和单细胞的方法。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad297.

Hub genes and pathways in gastric cancer: A comparison between studies that used normal tissues adjacent to the tumour and studies that used healthy tissues as calibrator.胃癌的枢纽基因和通路：使用肿瘤旁正常组织与使用健康组织作为校准物的研究之间的比较。

IET Syst Biol. 2023 Jun;17(3):131-141. doi: 10.1049/syb2.12065. Epub 2023 Apr 29.

Identification of Altered Primary Immunodeficiency-Associated Genes and Their Implications in Pediatric Cancers.原发性免疫缺陷相关基因改变的鉴定及其在儿童癌症中的意义。

Cancers (Basel). 2022 Nov 30;14(23):5942. doi: 10.3390/cancers14235942.

Reversal of cancer gene expression identifies repurposed drugs for diffuse intrinsic pontine glioma.逆转癌症基因表达可鉴定弥漫性内生脑桥胶质瘤的再利用药物。

Acta Neuropathol Commun. 2022 Oct 23;10(1):150. doi: 10.1186/s40478-022-01463-z.

本文引用的文献

Combined inhibition of atypical PKC and histone deacetylase 1 is cooperative in basal cell carcinoma treatment.联合抑制非典型蛋白激酶 C 和组蛋白去乙酰化酶 1 可协同治疗基底细胞癌。

JCI Insight. 2017 Nov 2;2(21):97071. doi: 10.1172/jci.insight.97071.

Comprehensive analysis of normal adjacent to tumor transcriptomes.肿瘤相邻正常组织转录组的综合分析

Nat Commun. 2017 Oct 20;8(1):1077. doi: 10.1038/s41467-017-01027-z.

Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets.癌症基因表达的逆转与药物疗效相关，并揭示了治疗靶点。

Nat Commun. 2017 Jul 12;8:16022. doi: 10.1038/ncomms16022.

Toil enables reproducible, open source, big biomedical data analyses.Toil支持可重复的、开源的大型生物医学数据分析。

Nat Biotechnol. 2017 Apr 11;35(4):314-316. doi: 10.1038/nbt.3772.

Computational Discovery of Niclosamide Ethanolamine, a Repurposed Drug Candidate That Reduces Growth of Hepatocellular Carcinoma Cells In Vitro and in Mice by Inhibiting Cell Division Cycle 37 Signaling.氯硝柳胺乙醇胺的计算发现，一种重新利用的候选药物，通过抑制细胞分裂周期37信号通路在体外和小鼠体内降低肝癌细胞的生长。

Gastroenterology. 2017 Jun;152(8):2022-2036. doi: 10.1053/j.gastro.2017.02.039. Epub 2017 Mar 8.

In silico and in vitro drug screening identifies new therapeutic approaches for Ewing sarcoma.计算机模拟和体外药物筛选确定了尤因肉瘤的新治疗方法。

Oncotarget. 2017 Jan 17;8(3):4079-4095. doi: 10.18632/oncotarget.13385.

Analysis of Matched Tumor and Normal Profiles Reveals Common Transcriptional and Epigenetic Signals Shared across Cancer Types.配对肿瘤与正常样本特征分析揭示了不同癌症类型共有的常见转录和表观遗传信号。

PLoS One. 2015 Nov 10;10(11):e0142618. doi: 10.1371/journal.pone.0142618. eCollection 2015.

A systematic assessment of linking gene expression with genetic variants for prioritizing candidate targets.将基因表达与遗传变异相联系以确定候选靶点优先级的系统评估。

Pac Symp Biocomput. 2015;20:383-94.

Normalization of RNA-seq data using factor analysis of control genes or samples.使用对照基因或样本的因子分析对RNA测序数据进行标准化。

Nat Biotechnol. 2014 Sep;32(9):896-902. doi: 10.1038/nbt.2931. Epub 2014 Aug 24.

voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.voom：精确权重为RNA测序读数计数解锁线性模型分析工具。

Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用深度学习方法为癌症研究选择精确的参考正常组织样本。

Selecting precise reference normal tissue samples for cancer research using a deep learning approach.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献