• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TULIP:一种基于RNA测序,使用卷积神经网络的原发性肿瘤类型预测工具。

TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks.

作者信息

Jones Sara, Beyers Matthew, Shukla Maulik, Xia Fangfang, Brettin Thomas, Stevens Rick, Weil M Ryan, Ranganathan Ganakammal Satishkumar

机构信息

Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA.

Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA.

出版信息

Cancer Inform. 2022 Dec 5;21:11769351221139491. doi: 10.1177/11769351221139491. eCollection 2022.

DOI:10.1177/11769351221139491
PMID:36507076
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9729992/
Abstract

BACKGROUND

With cancer as one of the leading causes of death worldwide, accurate primary tumor type prediction is critical in identifying genetic factors that can inhibit or slow tumor progression. There have been efforts to categorize primary tumor types with gene expression data using machine learning, and more recently with deep learning, in the last several years.

METHODS

In this paper, we developed four 1-dimensional (1D) Convolutional Neural Network (CNN) models to classify RNA-seq count data as one of 17 highly represented primary tumor types or 32 primary tumor types regardless of imbalanced representation. Additionally, we adapted the models to take as input either all Ensembl genes (60,483) or protein coding genes only (19,758). Unlike previous work, we avoided selection bias by not filtering genes based on expression values. RNA-seq count data expressed as FPKM-UQ of 9,025 and 10,940 samples from The Cancer Genome Atlas (TCGA) were downloaded from the Genomic Data Commons (GDC) corresponding to 17 and 32 primary tumor types respectively for training and validating the models.

RESULTS

All 4 1D-CNN models had an overall accuracy of 94.7% to 97.6% on the test dataset. Further evaluation indicates that the models with protein coding genes only as features performed with better accuracy compared to the models with all Ensembl genes for both 17 and 32 primary tumor types. For all models, the accuracy by primary tumor type was above 80% for most primary tumor types.

CONCLUSIONS

We packaged all 4 models as a Python-based deep learning classification tool called TULIP (TUmor CLassIfication Predictor) for performing quality control on primary tumor samples and characterizing cancer samples of unknown tumor type. Further optimization of the models is needed to improve the accuracy of certain primary tumor types.

摘要

背景

癌症是全球主要死因之一,准确预测原发性肿瘤类型对于识别可抑制或减缓肿瘤进展的遗传因素至关重要。在过去几年中,人们一直在努力利用机器学习,最近则是利用深度学习,通过基因表达数据对原发性肿瘤类型进行分类。

方法

在本文中,我们开发了四个一维(1D)卷积神经网络(CNN)模型,将RNA序列计数数据分类为17种高度代表性的原发性肿瘤类型之一或32种原发性肿瘤类型,而不考虑其不均衡的代表性。此外,我们调整模型,使其以所有Ensembl基因(60,483个)或仅蛋白质编码基因(19,758个)作为输入。与之前的工作不同,我们没有基于表达值过滤基因,从而避免了选择偏差。从基因组数据共享库(GDC)下载了来自癌症基因组图谱(TCGA)的9,025个和10,940个样本的以每百万映射读取中每千碱基转录本片段数(FPKM-UQ)表示的RNA序列计数数据,分别对应17种和32种原发性肿瘤类型,用于训练和验证模型。

结果

所有4个1D-CNN模型在测试数据集上的总体准确率为94.7%至97.6%。进一步评估表明,对于17种和32种原发性肿瘤类型,仅以蛋白质编码基因为特征的模型比以所有Ensembl基因为特征的模型表现出更高的准确率。对于所有模型,大多数原发性肿瘤类型的按原发性肿瘤类型划分的准确率高于80%。

结论

我们将所有4个模型打包为一个基于Python的深度学习分类工具,称为郁金香(TULIP,肿瘤分类预测器),用于对原发性肿瘤样本进行质量控制,并对未知肿瘤类型的癌症样本进行特征描述。需要对模型进行进一步优化,以提高某些原发性肿瘤类型的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/c143c177350c/10.1177_11769351221139491-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/1f826d157a5a/10.1177_11769351221139491-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/08723b88715f/10.1177_11769351221139491-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/a21d82c5ce72/10.1177_11769351221139491-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/9f6cae23d095/10.1177_11769351221139491-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/87e48f876cf3/10.1177_11769351221139491-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/ccfedfee290b/10.1177_11769351221139491-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/c143c177350c/10.1177_11769351221139491-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/1f826d157a5a/10.1177_11769351221139491-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/08723b88715f/10.1177_11769351221139491-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/a21d82c5ce72/10.1177_11769351221139491-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/9f6cae23d095/10.1177_11769351221139491-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/87e48f876cf3/10.1177_11769351221139491-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/ccfedfee290b/10.1177_11769351221139491-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f39/9729992/c143c177350c/10.1177_11769351221139491-fig7.jpg

相似文献

1
TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks.TULIP:一种基于RNA测序,使用卷积神经网络的原发性肿瘤类型预测工具。
Cancer Inform. 2022 Dec 5;21:11769351221139491. doi: 10.1177/11769351221139491. eCollection 2022.
2
CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.CUP-AI-Dx:一种使用 RNA 基因表达数据和人工智能推断癌症组织来源和分子亚型的工具。
EBioMedicine. 2020 Nov;61:103030. doi: 10.1016/j.ebiom.2020.103030. Epub 2020 Oct 9.
3
Convolutional neural network models for cancer type prediction based on gene expression.基于基因表达的癌症类型预测卷积神经网络模型。
BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.
4
DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning.DEGnext:使用具有迁移学习的卷积神经网络对 RNA-seq 数据进行差异表达基因分类。
BMC Bioinformatics. 2022 Jan 6;23(1):17. doi: 10.1186/s12859-021-04527-4.
5
Classification of Cancer Types Using Graph Convolutional Neural Networks.使用图卷积神经网络对癌症类型进行分类
Front Phys. 2020 Jun;8. doi: 10.3389/fphy.2020.00203. Epub 2020 Jun 17.
6
cTULIP: application of a human-based RNA-seq primary tumor classification tool for cross-species primary tumor classification in canine.cTULIP:一种基于人类的RNA测序原发性肿瘤分类工具在犬类跨物种原发性肿瘤分类中的应用。
Front Oncol. 2023 Jul 20;13:1216892. doi: 10.3389/fonc.2023.1216892. eCollection 2023.
7
A deep dive into understanding tumor foci classification using multiparametric MRI based on convolutional neural network.基于卷积神经网络,深入探究利用多参数磁共振成像进行肿瘤病灶分类。
Med Phys. 2020 Sep;47(9):4077-4086. doi: 10.1002/mp.14255. Epub 2020 Jun 12.
8
fMRI volume classification using a 3D convolutional neural network robust to shifted and scaled neuronal activations.使用对移位和缩放神经元激活具有鲁棒性的 3D 卷积神经网络进行 fMRI 体积分类。
Neuroimage. 2020 Dec;223:117328. doi: 10.1016/j.neuroimage.2020.117328. Epub 2020 Sep 5.
9
Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN.利用传统机器学习算法和 CNN 从基因组序列数据中准确快速预测结核病耐药性。
Sci Rep. 2022 Feb 14;12(1):2427. doi: 10.1038/s41598-022-06449-4.
10
Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.基于基因表达数据的癌症生存预测的卷积神经网络迁移学习。
PLoS One. 2020 Mar 26;15(3):e0230536. doi: 10.1371/journal.pone.0230536. eCollection 2020.

引用本文的文献

1
Tracing unknown tumor origins with a biological-pathway-based transformer model.基于生物途径的变换模型追踪未知肿瘤起源。
Cell Rep Methods. 2024 Jun 17;4(6):100797. doi: 10.1016/j.crmeth.2024.100797.
2
A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies.比较 RNA-Seq 数据预处理管道,以跨独立研究进行转录组预测。
BMC Bioinformatics. 2024 May 8;25(1):181. doi: 10.1186/s12859-024-05801-x.
3
Machine learning for pan-cancer classification based on RNA sequencing data.基于RNA测序数据的全癌种分类机器学习方法

本文引用的文献

1
Classification and Functional Analysis between Cancer and Normal Tissues Using Explainable Pathway Deep Learning through RNA-Sequencing Gene Expression.基于 RNA 测序基因表达的可解释通路深度学习进行癌症与正常组织的分类与功能分析。
Int J Mol Sci. 2021 Oct 26;22(21):11531. doi: 10.3390/ijms222111531.
2
Deep learning in cancer diagnosis, prognosis and treatment selection.深度学习在癌症诊断、预后和治疗选择中的应用。
Genome Med. 2021 Sep 27;13(1):152. doi: 10.1186/s13073-021-00968-x.
3
Comparative Proteome Identifies Complement Component 3-Mediated Immune Response as Key Difference of Colon Adenocarcinoma and Rectal Adenocarcinoma.
Front Mol Biosci. 2023 Nov 10;10:1285795. doi: 10.3389/fmolb.2023.1285795. eCollection 2023.
4
cTULIP: application of a human-based RNA-seq primary tumor classification tool for cross-species primary tumor classification in canine.cTULIP:一种基于人类的RNA测序原发性肿瘤分类工具在犬类跨物种原发性肿瘤分类中的应用。
Front Oncol. 2023 Jul 20;13:1216892. doi: 10.3389/fonc.2023.1216892. eCollection 2023.
比较蛋白质组学鉴定出补体成分3介导的免疫反应是结肠癌和直肠癌的关键差异。
Front Oncol. 2021 Feb 15;10:617890. doi: 10.3389/fonc.2020.617890. eCollection 2020.
4
The NCI Genomic Data Commons.美国国立癌症研究所基因组数据共享库
Nat Genet. 2021 Mar;53(3):257-262. doi: 10.1038/s41588-021-00791-5.
5
Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020:全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。
CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.
6
Classification of Cancer Types Using Graph Convolutional Neural Networks.使用图卷积神经网络对癌症类型进行分类
Front Phys. 2020 Jun;8. doi: 10.3389/fphy.2020.00203. Epub 2020 Jun 17.
7
Cancer Statistics, 2021.癌症统计数据,2021.
CA Cancer J Clin. 2021 Jan;71(1):7-33. doi: 10.3322/caac.21654. Epub 2021 Jan 12.
8
Convolutional neural network models for cancer type prediction based on gene expression.基于基因表达的癌症类型预测卷积神经网络模型。
BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.
9
Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.起源细胞模式主导了 33 种癌症类型的 10000 个肿瘤的分子分类。
Cell. 2018 Apr 5;173(2):291-304.e6. doi: 10.1016/j.cell.2018.03.022.
10
A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data.利用癌症基因组图谱基因表达数据进行的全面基因组泛癌分类。
BMC Genomics. 2017 Jul 3;18(1):508. doi: 10.1186/s12864-017-3906-0.