Miao Yongchang, Zhang Xueliang, Chen Sijie, Zhou Wenjing, Xu Dalai, Shi Xiaoli, Li Jian, Tu Jinhui, Yuan Xuelian, Lv Kebo, Tian Geng
Gastroenterology Center, The Second People's Hospital of Lianyungang, Lianyungang, China.
Lianyungang Clinical College of Xuzhou Medical University, Lianyungang, China.
Front Oncol. 2022 Aug 9;12:946552. doi: 10.3389/fonc.2022.946552. eCollection 2022.
Cancer of unknown primary (CUP) refers to cancer with primary lesion unidentifiable by regular pathological and clinical diagnostic methods. This kind of cancer is extremely difficult to treat, and patients with CUP usually have a very short survival time. Recent studies have suggested that cancer treatment targeting primary lesion will significantly improve the survival of CUP patients. Thus, it is critical to develop accurate yet fast methods to infer the tissue-of-origin (TOO) of CUP. In the past years, there are a few computational methods to infer TOO based on single omics data like gene expression, methylation, somatic mutation, and so on. However, the metastasis of tumor involves the interaction of multiple levels of biological molecules. In this study, we developed a novel computational method to predict TOO of CUP patients by explicitly integrating expression quantitative trait loci (eQTL) into an XGBoost classification model. We trained our model with The Cancer Genome Atlas (TCGA) data involving over 7,000 samples across 20 types of solid tumors. In the 10-fold cross-validation, the prediction accuracy of the model with eQTL was over 0.96, better than that without eQTL. In addition, we also tested our model in an independent data downloaded from Gene Expression Omnibus (GEO) consisting of 87 samples across 4 cancer types. The model also achieved an f1-score of 0.7-1 depending on different cancer types. In summary, eQTL was an important information in inferring cancer TOO and the model might be applied in clinical routine test for CUP patients in the future.
原发灶不明的癌症(CUP)是指通过常规病理和临床诊断方法无法确定原发灶的癌症。这类癌症极难治疗,CUP患者的生存时间通常非常短。最近的研究表明,针对原发灶的癌症治疗将显著提高CUP患者的生存率。因此,开发准确且快速的方法来推断CUP的组织起源(TOO)至关重要。在过去几年中,有一些基于基因表达、甲基化、体细胞突变等单组学数据推断TOO的计算方法。然而,肿瘤转移涉及多个层次生物分子的相互作用。在本研究中,我们开发了一种新的计算方法,通过将表达数量性状基因座(eQTL)明确整合到XGBoost分类模型中来预测CUP患者的TOO。我们使用来自癌症基因组图谱(TCGA)的数据训练模型,该数据涉及20种实体瘤的7000多个样本。在10折交叉验证中,包含eQTL的模型预测准确率超过0.96,优于不包含eQTL的模型。此外,我们还在从基因表达综合数据库(GEO)下载的独立数据中测试了我们的模型,该数据由4种癌症类型的87个样本组成。根据不同的癌症类型,该模型的F1分数也达到了0.7 - 1。总之,eQTL是推断癌症TOO的重要信息,该模型未来可能应用于CUP患者的临床常规检测。