Research Center of Health Big Data Mining and Applications, School of Medical Information, Wannan Medical College, Wuhu, 241002, People's Republic of China.
Key Laboratory of Non-Coding RNA Transformation Research of Anhui Higher Education Institution, Wannan Medical College, Wuhu, 241000, People's Republic of China.
J Transl Med. 2022 Apr 18;20(1):177. doi: 10.1186/s12967-022-03369-9.
For a long time, breast cancer has been a leading cancer diagnosed in women worldwide, and approximately 90% of cancer-related deaths are caused by metastasis. For this reason, finding new biomarkers related to metastasis is an urgent task to predict the metastatic status of breast cancer and provide new therapeutic targets.
In this research, an efficient model of eXtreme Gradient Boosting (XGBoost) optimized by a grid search algorithm is established to realize auxiliary identification of metastatic breast tumors based on gene expression. Estimated by ten-fold cross-validation, the optimized XGBoost classifier can achieve an overall higher mean AUC of 0.82 compared to other classifiers such as DT, SVM, KNN, LR, and RF.
A novel 6-gene signature (SQSTM1, GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) was selected by feature importance ranking and a series of in vitro experiments were conducted to verify the potential role of each biomarker. In general, the effects of SQSTM in tumor cells are assigned as a risk factor, while the effects of the other 5 genes (GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) in immune cells are assigned as protective factors.
Our findings will allow for a more accurate prediction of the metastatic status of breast cancer and will benefit the mining of breast cancer metastasis-related biomarkers.
长期以来,乳腺癌一直是全球女性中诊断出的主要癌症,约 90%的癌症相关死亡是由转移引起的。因此,寻找新的与转移相关的生物标志物是预测乳腺癌转移状态并提供新的治疗靶点的当务之急。
在这项研究中,建立了一种经过网格搜索算法优化的高效极端梯度提升(XGBoost)模型,以实现基于基因表达的转移性乳腺癌的辅助识别。通过十折交叉验证估计,优化的 XGBoost 分类器的总体平均 AUC 为 0.82,明显高于 DT、SVM、KNN、LR 和 RF 等其他分类器。
通过特征重要性排名选择了一个新的 6 基因特征(SQSTM1、GDF9、LINC01125、PTGS2、GVINP1 和 TMEM64),并进行了一系列体外实验来验证每个生物标志物的潜在作用。总的来说,SQSTM 在肿瘤细胞中的作用被分配为风险因素,而其他 5 个基因(GDF9、LINC01125、PTGS2、GVINP1 和 TMEM64)在免疫细胞中的作用被分配为保护因素。
我们的研究结果将能够更准确地预测乳腺癌的转移状态,并有利于挖掘乳腺癌转移相关的生物标志物。