Department of Surgery, İnönü University Faculty of Medicine, Malatya, Turkey; Department of Public Health, İnönü University Faculty of Medicine, Malatya, Turkey; Department of Biostatistics and Medical Informatics, İnönü University Faculty of Medicine, Malatya, Turkey.
Department of Biostatistics and Medical Informatics, İnönü University Faculty of Medicine, Malatya, Turkey.
Turk J Gastroenterol. 2023 Oct;34(10):1025-1034. doi: 10.5152/tjg.2023.22346.
BACKGROUND/AIMS: The aim of this study was to both classify data of familial adenomatous polyposis patients with and without duode- nal cancer and to identify important genes that may be related to duodenal cancer by XGboost model.
The current study was performed using expression profile data from a series of duodenal samples from familial adenomatous polyposis patients to explore variations in the familial adenomatous polyposis duodenal adenoma-carcinoma sequence. The expression profiles obtained from cancerous, adenomatous, and normal tissues of 12 familial adenomatous polyposis patients with duodenal cancer and the tissues of 12 familial adenomatous polyposis patients without duodenal cancer were compared. The ElasticNet approach was utilized for the feature selection. Using 5-fold cross-validation, one of the machine learning approaches, XGboost, was utilized to classify duodenal cancer. Accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score performance metrics were assessed for model performance.
According to the variable importance obtained from the modeling, ADH1C, DEFA5, CPS1, SPP1, DMBT1, VCAN-AS1, APOB genes (cancer vs. adenoma); LOC399753, APOA4, MIR548X, and ADH1C genes (adenoma vs. adenoma); SNORD123, CEACAM6, SNORD78, ANXA10, SPINK1, and CPS1 (normal vs. adenoma) genes can be used as predictive biomarkers.
The proposed model used in this study shows that the aforementioned genes can forecast the risk of duodenal cancer in patients with familial adenomatous polyposis. More comprehensive analyses should be performed in the future to assess the reliability of the genes determined.
背景/目的:本研究旨在通过 XGboost 模型对家族性腺瘤性息肉病患者的十二指肠癌数据进行分类,并确定与十二指肠癌相关的重要基因。
本研究使用家族性腺瘤性息肉病患者十二指肠样本的表达谱数据,探讨家族性腺瘤性息肉病十二指肠腺瘤-癌序列中的变化。比较了 12 例家族性腺瘤性息肉病伴十二指肠癌患者的癌组织、腺瘤组织和正常组织,以及 12 例家族性腺瘤性息肉病无十二指肠癌患者的组织。采用弹性网络法进行特征选择。利用 5 折交叉验证,采用机器学习方法之一的 XGboost 对十二指肠癌进行分类。评估模型性能的指标包括准确率、平衡准确率、灵敏度、特异性、阳性预测值、阴性预测值和 F1 评分。
根据建模得到的变量重要性,ADH1C、DEFA5、CPS1、SPP1、DMBT1、VCAN-AS1、APOB 基因(癌与腺瘤);LOC399753、APOA4、MIR548X 和 ADH1C 基因(腺瘤与腺瘤);SNORD123、CEACAM6、SNORD78、ANXA10、SPINK1 和 CPS1 基因(正常与腺瘤)可作为预测生物标志物。
本研究提出的模型表明,上述基因可预测家族性腺瘤性息肉病患者发生十二指肠癌的风险。未来应进行更全面的分析,以评估所确定基因的可靠性。