Suppr超能文献

利用 RNA-Seq 数据和梯度提升策略鉴定肿瘤组织起源。

Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy.

机构信息

School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.

Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China.

出版信息

Biomed Res Int. 2021 Feb 17;2021:6653793. doi: 10.1155/2021/6653793. eCollection 2021.

Abstract

BACKGROUND

Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP.

METHODS

We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set.

RESULTS

400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%.

CONCLUSION

Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage.

摘要

背景

不明原发癌(CUP)是一种恶性肿瘤,组织学上被诊断为转移性癌,但无法确定其起源组织。CUP 约占所有癌症的 5%。CUP 的传统治疗主要是广谱化疗,但预后相对较差。因此,准确推断 CUP 的起源组织具有重要的临床意义。

方法

我们开发了一个梯度提升框架来追踪 20 种实体瘤的起源组织。具体来说,我们从癌症基因组图谱(TCGA)下载了 7713 个样本的 20501 个基因的表达谱,作为训练数据集。我们还从基因表达综合数据库(GEO)下载了来自 6 种已知起源癌症类型的 79 个肿瘤样本的 RNA-seq 数据,作为独立数据集。

结果

我们选择了 400 个基因来训练一个梯度提升模型,以识别肿瘤的原发部位。我们的方法在 20 种癌症的 10 倍交叉验证中的总体准确率为 96.1%,而在独立数据集上的准确率达到了 83.5%。

结论

我们的梯度提升框架在训练数据和独立测试数据上都证明了识别肿瘤起源组织的准确性,具有实际应用价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de30/7904362/36b5b78cceca/BMRI2021-6653793.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验