Suppr超能文献

比较和发展用于预测中国人群中塞来昔布诱导的周围神经病变的难治性克罗恩病的机器学习方法。

Comparison and development of machine learning for thalidomide-induced peripheral neuropathy prediction of refractory Crohn's disease in Chinese population.

机构信息

Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China.

Guangdong Provincial Key Laboratory of New Drug Design and Evaluation, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China.

出版信息

World J Gastroenterol. 2023 Jun 28;29(24):3855-3870. doi: 10.3748/wjg.v29.i24.3855.

Abstract

BACKGROUND

Thalidomide is an effective treatment for refractory Crohn's disease (CD). However, thalidomide-induced peripheral neuropathy (TiPN), which has a large individual variation, is a major cause of treatment failure. TiPN is rarely predictable and recognized, especially in CD. It is necessary to develop a risk model to predict TiPN occurrence.

AIM

To develop and compare a predictive model of TiPN using machine learning based on comprehensive clinical and genetic variables.

METHODS

A retrospective cohort of 164 CD patients from January 2016 to June 2022 was used to establish the model. The National Cancer Institute Common Toxicity Criteria Sensory Scale (version 4.0) was used to assess TiPN. With 18 clinical features and 150 genetic variables, five predictive models were established and evaluated by the confusion matrix receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), specificity, sensitivity (recall rate), precision, accuracy, and F1 score.

RESULTS

The top-ranking five risk variables associated with TiPN were interleukin-12 rs1353248 [ = 0.0004, odds ratio (OR): 8.983, 95% confidence interval (CI): 2.497-30.90], dose (mg/d, = 0.002), brain-derived neurotrophic factor (BDNF) rs2030324 ( = 0.001, OR: 3.164, 95%CI: 1.561-6.434), BDNF rs6265 ( = 0.001, OR: 3.150, 95%CI: 1.546-6.073) and BDNF rs11030104 ( = 0.001, OR: 3.091, 95%CI: 1.525-5.960). In the training set, gradient boosting decision tree (GBDT), extremely random trees (ET), random forest, logistic regression and extreme gradient boosting (XGBoost) obtained AUROC values > 0.90 and AUPRC > 0.87. Among these models, XGBoost and GBDT obtained the first two highest AUROC (0.90 and 1), AUPRC (0.98 and 1), accuracy (0.96 and 0.98), precision (0.90 and 0.95), F1 score (0.95 and 0.98), specificity (0.94 and 0.97), and sensitivity (1). In the validation set, XGBoost algorithm exhibited the best predictive performance with the highest specificity (0.857), accuracy (0.818), AUPRC (0.86) and AUROC (0.89). ET and GBDT obtained the highest sensitivity (1) and F1 score (0.8). Overall, compared with other state-of-the-art classifiers such as ET, GBDT and RF, XGBoost algorithm not only showed a more stable performance, but also yielded higher ROC-AUC and PRC-AUC scores, demonstrating its high accuracy in prediction of TiPN occurrence.

CONCLUSION

The powerful XGBoost algorithm accurately predicts TiPN using 18 clinical features and 14 genetic variables. With the ability to identify high-risk patients using single nucleotide polymorphisms, it offers a feasible option for improving thalidomide efficacy in CD patients.

摘要

背景

沙利度胺是治疗难治性克罗恩病(CD)的有效药物。然而,沙利度胺引起的周围神经病变(TiPN)具有很大的个体差异,是治疗失败的主要原因。TiPN 很少具有可预测性和可识别性,尤其是在 CD 中。因此,有必要开发一种风险模型来预测 TiPN 的发生。

目的

利用基于综合临床和遗传变量的机器学习方法,开发和比较 TiPN 的预测模型。

方法

回顾性分析了 2016 年 1 月至 2022 年 6 月期间的 164 例 CD 患者,用于建立模型。采用国家癌症研究所常见毒性标准感觉量表(第 4.0 版)评估 TiPN。使用 18 个临床特征和 150 个遗传变量,建立并评估了 5 种预测模型,通过混淆矩阵、接收者操作特征曲线(AUROC)、精准召回曲线下面积(AUPRC)、特异性、敏感性(召回率)、精准度、准确性和 F1 评分进行评估。

结果

与 TiPN 相关的五个风险变量排名最高的是白细胞介素-12 rs1353248[=0.0004,比值比(OR):8.983,95%置信区间(CI):2.497-30.90]、剂量(mg/d,=0.002)、脑源性神经营养因子(BDNF)rs2030324[=0.001,OR:3.164,95%CI:1.561-6.434]、BDNF rs6265[=0.001,OR:3.150,95%CI:1.546-6.073]和 BDNF rs11030104[=0.001,OR:3.091,95%CI:1.525-5.960]。在训练集中,梯度提升决策树(GBDT)、极端随机树(ET)、随机森林、逻辑回归和极端梯度提升(XGBoost)获得的 AUROC 值>0.90 和 AUPRC>0.87。在这些模型中,XGBoost 和 GBDT 获得了最高的 AUROC(0.90 和 1)、AUPRC(0.98 和 1)、准确性(0.96 和 0.98)、精准度(0.90 和 0.95)、F1 评分(0.95 和 0.98)、特异性(0.94 和 0.97)和敏感性(1)。在验证集中,XGBoost 算法表现出最佳的预测性能,具有最高的特异性(0.857)、准确性(0.818)、AUPRC(0.86)和 AUROC(0.89)。ET 和 GBDT 获得了最高的敏感性(1)和 F1 评分(0.8)。总体而言,与 ET、GBDT 和 RF 等其他最先进的分类器相比,XGBoost 算法不仅表现出更稳定的性能,而且还产生了更高的 ROC-AUC 和 PRC-AUC 分数,表明其在预测 TiPN 发生方面具有很高的准确性。

结论

强大的 XGBoost 算法使用 18 个临床特征和 14 个遗传变量准确预测 TiPN。通过识别单核苷酸多态性,它为提高 CD 患者沙利度胺的疗效提供了一种可行的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0c/10324537/0c468197cac7/WJG-29-3855-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验