利用 RNA-Seq 数据和梯度提升策略鉴定肿瘤组织起源。

Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy.

机构信息

School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.

Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China.

出版信息

Biomed Res Int. 2021 Feb 17;2021:6653793. doi: 10.1155/2021/6653793. eCollection 2021.

DOI:10.1155/2021/6653793

PMID:33681364

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7904362/

Abstract

BACKGROUND

Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP.

METHODS

We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set.

RESULTS

400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%.

CONCLUSION

Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage.

摘要

背景

不明原发癌（CUP）是一种恶性肿瘤，组织学上被诊断为转移性癌，但无法确定其起源组织。CUP 约占所有癌症的 5%。CUP 的传统治疗主要是广谱化疗，但预后相对较差。因此，准确推断 CUP 的起源组织具有重要的临床意义。

方法

我们开发了一个梯度提升框架来追踪 20 种实体瘤的起源组织。具体来说，我们从癌症基因组图谱（TCGA）下载了 7713 个样本的 20501 个基因的表达谱，作为训练数据集。我们还从基因表达综合数据库（GEO）下载了来自 6 种已知起源癌症类型的 79 个肿瘤样本的 RNA-seq 数据，作为独立数据集。

结果

我们选择了 400 个基因来训练一个梯度提升模型，以识别肿瘤的原发部位。我们的方法在 20 种癌症的 10 倍交叉验证中的总体准确率为 96.1%，而在独立数据集上的准确率达到了 83.5%。

结论

我们的梯度提升框架在训练数据和独立测试数据上都证明了识别肿瘤起源组织的准确性，具有实际应用价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de30/7904362/36b5b78cceca/BMRI2021-6653793.001.jpg

相似文献

Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy.

Biomed Res Int. 2021 Feb 17;2021:6653793. doi: 10.1155/2021/6653793. eCollection 2021.

A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data.

Front Oncol. 2022 Apr 21;12:832567. doi: 10.3389/fonc.2022.832567. eCollection 2022.

CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.

EBioMedicine. 2020 Nov;61:103030. doi: 10.1016/j.ebiom.2020.103030. Epub 2020 Oct 9.

Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin.

Front Cell Dev Biol. 2021 May 3;9:619330. doi: 10.3389/fcell.2021.619330. eCollection 2021.

TOD-CUP: a gene expression rank-based majority vote algorithm for tissue origin diagnosis of cancers of unknown primary.

Brief Bioinform. 2021 Mar 22;22(2):2106-2118. doi: 10.1093/bib/bbaa031.

A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing.

Sci Rep. 2023 Sep 16;13(1):15356. doi: 10.1038/s41598-023-42465-8.

TOOme: A Novel Computational Framework to Infer Cancer Tissue-of-Origin by Integrating Both Gene Mutation and Expression.

Front Bioeng Biotechnol. 2020 May 19;8:394. doi: 10.3389/fbioe.2020.00394. eCollection 2020.

RNA-Seq accurately identifies cancer biomarker signatures to distinguish tissue of origin.

Neoplasia. 2014 Nov 20;16(11):918-27. doi: 10.1016/j.neo.2014.09.007. eCollection 2014 Nov.

Identifying cancer tissue-of-origin by a novel machine learning method based on expression quantitative trait loci.

Front Oncol. 2022 Aug 9;12:946552. doi: 10.3389/fonc.2022.946552. eCollection 2022.

Identification and validation of 12 immune-related genes as a prognostic signature for colon adenocarcinoma.

J Biochem Mol Toxicol. 2021 Sep;35(9):e22852. doi: 10.1002/jbt.22852. Epub 2021 Aug 15.

引用本文的文献

AITeQ: a machine learning framework for Alzheimer's prediction using a distinctive five-gene signature.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae291.

New techniques to identify the tissue of origin for cancer of unknown primary in the era of precision medicine: progress and challenges.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae028.

Retracted: Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy.

Biomed Res Int. 2023 Nov 29;2023:9865973. doi: 10.1155/2023/9865973. eCollection 2023.

A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing.

Sci Rep. 2023 Sep 16;13(1):15356. doi: 10.1038/s41598-023-42465-8.

Intra-tumor heterogeneity, turnover rate and karyotype space shape susceptibility to missegregation-induced extinction.

PLoS Comput Biol. 2023 Jan 23;19(1):e1010815. doi: 10.1371/journal.pcbi.1010815. eCollection 2023 Jan.

Pragmatic Expectancy on Microbiota and Non-Small Cell Lung Cancer: A Narrative Review.

Cancers (Basel). 2022 Jun 26;14(13):3131. doi: 10.3390/cancers14133131.

Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine.

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac191.

A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data.

Front Oncol. 2022 Apr 21;12:832567. doi: 10.3389/fonc.2022.832567. eCollection 2022.

90-Gene Expression Profiling for Tissue Origin Diagnosis of Cancer of Unknown Primary.

Front Oncol. 2021 Oct 7;11:722808. doi: 10.3389/fonc.2021.722808. eCollection 2021.

本文引用的文献

A Deep Learning-Based Chemical System for QSAR Prediction.

IEEE J Biomed Health Inform. 2020 Oct;24(10):3020-3028. doi: 10.1109/JBHI.2020.2977009. Epub 2020 Feb 28.

The Cdk2-c-Myc-miR-571 Axis Regulates DNA Replication and Genomic Stability by Targeting Geminin.

Cancer Res. 2019 Oct 1;79(19):4896-4910. doi: 10.1158/0008-5472.CAN-19-0020. Epub 2019 Aug 20.

Gene Expression Profiling for Diagnosis of Triple-Negative Breast Cancer: A Multicenter, Retrospective Cohort Study.

Front Oncol. 2019 May 7;9:354. doi: 10.3389/fonc.2019.00354. eCollection 2019.

A prognostic 11 long noncoding RNA expression signature for breast invasive carcinoma.

J Cell Biochem. 2019 Oct;120(10):16692-16702. doi: 10.1002/jcb.28927. Epub 2019 May 16.

Gastric metastasis of ovarian serous cystadenocarcinoma.

Int Med Case Rep J. 2018 Sep 5;11:201-204. doi: 10.2147/IMCRJ.S171985. eCollection 2018.

miR-137 mediates the functional link between c-Myc and EZH2 that regulates cisplatin resistance in ovarian cancer.

Oncogene. 2019 Jan;38(4):564-580. doi: 10.1038/s41388-018-0459-x. Epub 2018 Aug 30.

Bioinformatics analysis of RNA sequencing data reveals multiple key genes in uterine corpus endometrial carcinoma.

Oncol Lett. 2018 Jan;15(1):205-212. doi: 10.3892/ol.2017.7346. Epub 2017 Nov 3.

Reproductive factors and incidence of endometrial cancer in U.S. black women.

Cancer Causes Control. 2017 Jun;28(6):579-588. doi: 10.1007/s10552-017-0880-4. Epub 2017 Mar 30.

How to Diagnose and Treat a Cancer of Unknown Primary Site.

J Gastrointestin Liver Dis. 2017 Mar;26(1):69-79. doi: 10.15403/jgld.2014.1121.261.haz.

Dysregulation of the homeobox transcription factor gene HOXB13: role in prostate cancer.

Pharmgenomics Pers Med. 2014 Aug 5;7:193-201. doi: 10.2147/PGPM.S38117. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用 RNA-Seq 数据和梯度提升策略鉴定肿瘤组织起源。

Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献