利用癌症基因组图谱基因表达数据进行的全面基因组泛癌分类。

A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data.

作者信息

Li Yuanyuan, Kang Kai, Krahn Juno M, Croutwater Nicole, Lee Kevin, Umbach David M, Li Leping

机构信息

Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Durham, NC, 27709, USA.

Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences, NIH, Durham, NC, 27709, USA.

出版信息

BMC Genomics. 2017 Jul 3;18(1):508. doi: 10.1186/s12864-017-3906-0.

DOI:10.1186/s12864-017-3906-0

PMID:28673244

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5496318/

Abstract

BACKGROUND

The Cancer Genome Atlas (TCGA) has generated comprehensive molecular profiles. We aim to identify a set of genes whose expression patterns can distinguish diverse tumor types. Those features may serve as biomarkers for tumor diagnosis and drug development.

METHODS

Using RNA-seq expression data, we undertook a pan-cancer classification of 9,096 TCGA tumor samples representing 31 tumor types. We randomly assigned 75% of samples into training and 25% into testing, proportionally allocating samples from each tumor type.

RESULTS

We could correctly classify more than 90% of the test set samples. Accuracies were high for all but three of the 31 tumor types, in particular, for READ (rectum adenocarcinoma) which was largely indistinguishable from COAD (colon adenocarcinoma). We also carried out pan-cancer classification, separately for males and females, on 23 sex non-specific tumor types (those unrelated to reproductive organs). Results from these gender-specific analyses largely recapitulated results when gender was ignored. Remarkably, more than 80% of the 100 most discriminative genes selected from each gender separately overlapped. Genes that were differentially expressed between genders included BNC1, FAT2, FOXA1, and HOXA11. FOXA1 has been shown to play a role for sexual dimorphism in liver cancer. The differentially discriminative genes we identified might be important for the gender differences in tumor incidence and survival.

CONCLUSIONS

We were able to identify many sets of 20 genes that could correctly classify more than 90% of the samples from 31 different tumor types using TCGA RNA-seq data. This accuracy is remarkable given the number of the tumor types and the total number of samples involved. We achieved similar results when we analyzed 23 non-sex-specific tumor types separately for males and females. We regard the frequency with which a gene appeared in those sets as measuring its importance for tumor classification. One third of the 50 most frequently appearing genes were pseudogenes; the degree of enrichment may be indicative of their importance in tumor classification. Lastly, we identified a few genes that might play a role in sexual dimorphism in certain cancers.

摘要

背景

癌症基因组图谱（TCGA）已生成全面的分子图谱。我们旨在鉴定一组基因，其表达模式能够区分不同的肿瘤类型。这些特征可作为肿瘤诊断和药物开发的生物标志物。

方法

利用RNA测序表达数据，我们对代表31种肿瘤类型的9096个TCGA肿瘤样本进行了泛癌分类。我们将75%的样本随机分配到训练组，25%分配到测试组，并按比例分配每种肿瘤类型的样本。

结果

我们能够正确分类超过90%的测试集样本。除了31种肿瘤类型中的三种之外，其他所有类型的准确率都很高，特别是直肠腺癌（READ），它在很大程度上与结肠腺癌（COAD）难以区分。我们还对23种非性别特异性肿瘤类型（与生殖器官无关的肿瘤）分别针对男性和女性进行了泛癌分类。这些性别特异性分析的结果在很大程度上重现了忽略性别时的结果。值得注意的是，从男性和女性中分别选出的100个最具鉴别力的基因中，超过80%是重叠的。在性别之间差异表达的基因包括BNC1、FAT2、FOXA1和HOXA11。FOXA1已被证明在肝癌的性别二态性中发挥作用。我们鉴定出的差异鉴别基因可能对肿瘤发病率和生存率的性别差异很重要。

结论

我们能够利用TCGA RNA测序数据鉴定出许多组20个基因，这些基因能够正确分类来自31种不同肿瘤类型的超过90%的样本。考虑到肿瘤类型的数量和所涉及的样本总数，这个准确率是非常显著的。当我们分别对男性和女性分析23种非性别特异性肿瘤类型时，也取得了类似的结果。我们将一个基因在这些基因集中出现的频率视为衡量其对肿瘤分类重要性的指标。出现频率最高的50个基因中有三分之一是假基因；其富集程度可能表明它们在肿瘤分类中的重要性。最后，我们鉴定出了一些可能在某些癌症的性别二态性中发挥作用的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a849/5496318/4efef38d0308/12864_2017_3906_Fig1_HTML.jpg

相似文献

A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data.利用癌症基因组图谱基因表达数据进行的全面基因组泛癌分类。

BMC Genomics. 2017 Jul 3;18(1):508. doi: 10.1186/s12864-017-3906-0.

Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types.对来自12种TCGA癌症类型的4043例癌症和548例正常组织对照进行大规模RNA测序转录组分析。

Sci Rep. 2015 Aug 21;5:13413. doi: 10.1038/srep13413.

Integrative genomic approaches to dissect clinically-significant relationships between the VDR cistrome and gene expression in primary colon cancer.采用整合基因组学方法剖析原发性结肠癌中维生素D受体顺式作用元件组与基因表达之间的临床显著关系。

J Steroid Biochem Mol Biol. 2017 Oct;173:130-138. doi: 10.1016/j.jsbmb.2016.12.013. Epub 2016 Dec 24.

BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis.BioXpress：一个用于泛癌分析的整合RNA测序衍生基因表达数据库。

Database (Oxford). 2015 Mar 28;2015. doi: 10.1093/database/bav019. Print 2015.

UALCAN: A Portal for Facilitating Tumor Subgroup Gene Expression and Survival Analyses.UALCAN：一个促进肿瘤亚组基因表达和生存分析的平台。

Neoplasia. 2017 Aug;19(8):649-658. doi: 10.1016/j.neo.2017.05.002. Epub 2017 Jul 18.

Tumor classification and biomarker discovery based on the 5'isomiR expression level.基于 5' 端异构体表达水平的肿瘤分类和生物标志物发现。

BMC Cancer. 2019 Feb 7;19(1):127. doi: 10.1186/s12885-019-5340-y.

Identification of Gene Expression Pattern Related to Breast Cancer Survival Using Integrated TCGA Datasets and Genomic Tools.使用整合的TCGA数据集和基因组工具鉴定与乳腺癌生存相关的基因表达模式

Biomed Res Int. 2015;2015:878546. doi: 10.1155/2015/878546. Epub 2015 Oct 20.

Molecular Signatures for Tumor Classification: An Analysis of The Cancer Genome Atlas Data.肿瘤分类的分子特征：对癌症基因组图谱数据的分析。

J Mol Diagn. 2017 Nov;19(6):881-891. doi: 10.1016/j.jmoldx.2017.07.008. Epub 2017 Sep 1.

Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.随机基因集表达与生存之间的关联在多种癌症类型中都很明显，并且可能可以通过亚分类来解释。

PLoS Comput Biol. 2018 Feb 22;14(2):e1006026. doi: 10.1371/journal.pcbi.1006026. eCollection 2018 Feb.

Transcriptome Analysis of Recurrently Deregulated Genes across Multiple Cancers Identifies New Pan-Cancer Biomarkers.跨多种癌症的反复失调基因的转录组分析鉴定出新的泛癌生物标志物。

Cancer Res. 2016 Jan 15;76(2):216-26. doi: 10.1158/0008-5472.CAN-15-0484. Epub 2015 Nov 9.

引用本文的文献

HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.标志性图：一种基于癌症特征的图神经网络，用于对肿瘤亚型进行分层分类。

Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf444.

Exosomal microRNA signatures in youth at clinical high risk for bipolar disorder.双相情感障碍临床高风险青年中的外泌体微小RNA特征

Front Psychiatry. 2025 May 20;16:1589374. doi: 10.3389/fpsyt.2025.1589374. eCollection 2025.

A three-subtype prognostic classification based on base excision repair and oxidative stress genes in lung adenocarcinoma and its relationship with tumor microenvironment.基于碱基切除修复和氧化应激基因的肺腺癌三亚型预后分类及其与肿瘤微环境的关系

Sci Rep. 2025 May 13;15(1):16647. doi: 10.1038/s41598-025-98088-8.

Endoplasmic reticulum stress disrupts signaling via altered processing of transmembrane receptors.内质网应激通过改变跨膜受体的加工过程来破坏信号传导。

Cell Commun Signal. 2025 Apr 30;23(1):209. doi: 10.1186/s12964-025-02208-w.

Pan-Cancer Analysis of ANO6 and Experimental Validation in Metastatic Melanoma.ANO6的泛癌分析及转移性黑色素瘤的实验验证

Biochem Genet. 2025 Mar 5. doi: 10.1007/s10528-025-11074-7.

A comparative analysis of gene expression profiling by statistical and machine learning approaches.通过统计和机器学习方法对基因表达谱进行的比较分析。

Bioinform Adv. 2024 Dec 18;5(1):vbae199. doi: 10.1093/bioadv/vbae199. eCollection 2025.

Emerging Signatures of Hematological Malignancies from Gene Expression and Transcription Factor-Gene Regulations.基因表达和转录因子-基因调控揭示血液系统恶性肿瘤的新特征

Int J Mol Sci. 2024 Dec 19;25(24):13588. doi: 10.3390/ijms252413588.

Deep profiling of gene expression across 18 human cancers.对18种人类癌症的基因表达进行深度分析。

Nat Biomed Eng. 2025 Mar;9(3):333-355. doi: 10.1038/s41551-024-01290-8. Epub 2024 Dec 17.

From molecular subgroups to molecular targeted therapy in rheumatoid arthritis: A bioinformatics approach.从类风湿关节炎的分子亚群到分子靶向治疗：一种生物信息学方法。

Heliyon. 2024 Aug 6;10(16):e35774. doi: 10.1016/j.heliyon.2024.e35774. eCollection 2024 Aug 30.

MSFN: a multi-omics stacked fusion network for breast cancer survival prediction.MSFN：一种用于乳腺癌生存预测的多组学堆叠融合网络。

Front Genet. 2024 Aug 2;15:1378809. doi: 10.3389/fgene.2024.1378809. eCollection 2024.

本文引用的文献

Sexual dimorphism in cancer.癌症中的性别二态性。

Nat Rev Cancer. 2016 May;16(5):330-9. doi: 10.1038/nrc.2016.30. Epub 2016 Apr 15.

Genomic characterization of sarcomatoid transformation in clear cell renal cell carcinoma.透明细胞肾细胞癌中肉瘤样转化的基因组特征

Proc Natl Acad Sci U S A. 2016 Feb 23;113(8):2170-5. doi: 10.1073/pnas.1525735113. Epub 2016 Feb 10.

Gender differences in incidence and outcomes of urothelial and kidney cancer.尿路上皮癌和肾癌发病率及预后的性别差异。

Nat Rev Urol. 2015 Dec;12(12):653. doi: 10.1038/nrurol.2015.257. Epub 2015 Oct 20.

A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types.癌症基因组进化的基因引力模型：对9种癌症类型的3000个癌症基因组的研究

PLoS Comput Biol. 2015 Sep 9;11(9):e1004497. doi: 10.1371/journal.pcbi.1004497. eCollection 2015 Sep.

Variation in genomic landscape of clear cell renal cell carcinoma across Europe.欧洲透明细胞肾细胞癌基因组景观的变异。

Nat Commun. 2014 Oct 29;5:5135. doi: 10.1038/ncomms6135.

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.对12种癌症类型的多平台分析揭示了原发组织内部和之间的分子分类。

Cell. 2014 Aug 14;158(4):929-944. doi: 10.1016/j.cell.2014.06.049. Epub 2014 Aug 7.

Sexually dimorphic RB inactivation underlies mesenchymal glioblastoma prevalence in males.性二态性RB失活是男性间充质胶质母细胞瘤患病率较高的基础。

J Clin Invest. 2014 Sep;124(9):4123-33. doi: 10.1172/JCI71048. Epub 2014 Aug 1.

The Pan-Cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes.假基因表达的泛癌分析揭示了生物学和临床相关的肿瘤亚型。

Nat Commun. 2014 Jul 7;5:3963. doi: 10.1038/ncomms4963.

Basonuclin-1 modulates epithelial plasticity and TGF-β1-induced loss of epithelial cell integrity.Basonuclin-1 调节上皮细胞可塑性和 TGF-β1 诱导的上皮细胞完整性丧失。

Oncogene. 2015 Feb 26;34(9):1185-95. doi: 10.1038/onc.2014.54. Epub 2014 Mar 24.

Mutational landscape and significance across 12 major cancer types.12 种主要癌症类型的突变特征及意义。

Nature. 2013 Oct 17;502(7471):333-339. doi: 10.1038/nature12634.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用癌症基因组图谱基因表达数据进行的全面基因组泛癌分类。

A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献