Liu Yu-Ruey, Nfor Oswald Ndi, Zhong Ji-Han, Lin Chun-Yuan, Liaw Yung-Po
College of Information and Electrical Engineering, Asia University, Taichung, 413, Taiwan.
Department of Emergency Medicine, Cheng Ching General Hospital, Taichung, Taiwan.
J Inflamm Res. 2024 Nov 26;17:9847-9856. doi: 10.2147/JIR.S490821. eCollection 2024.
We assessed the risk of gout in the Taiwan Biobank population by applying various machine learning algorithms. The study aimed to identify crucial risk factors and evaluate the performance of different models in gout prediction.
This study analyzed data from 88,210 individuals in the Taiwan Biobank, identifying 19,338 cases of gout and 68,872 controls. After data cleaning and propensity score matching for gender and age, the final analytical sample comprised 38,676 individuals (19,338 gout cases and 19,338 controls). Five machine learning models were used: Bayesian Network (BN), Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), and Neural Network (NN). The predictive performance was evaluated using a split dataset (80% training set and 20% test set).
Variable importance analysis was performed to identify key variables, with uric acid and gender emerging as the most influential risk factors. Descriptive data highlighted significant differences between the control group and gout patients, with a higher prevalence of gout in men (51.36% vs 48.64%). Both the RF and GB demonstrated high performance across multiple metrics, with RF consistently achieving a high area under the curve (AUC) of 0.986 to 0.987, alongside excellent sensitivity (0.945-0.947) and specificity (0.998-0.999). GB also performed robustly, with AUC values around 0.987-0.988 and maintaining high sensitivity (0.944-0.950) and specificity (0.995-0.999) across different model variations. The F1 scores for both models (GB and RF) indicate strong predictive capabilities, with values around 0.971-0.972.
The RF and GB demonstrated exceptional accuracy in predicting gout status, particularly when incorporating genetic data alongside clinical factors. These findings underscore the potential for integrating machine learning models with genetic information to enhance gout prediction accuracy in clinical practice.
我们通过应用各种机器学习算法评估了台湾生物银行人群中痛风的风险。该研究旨在确定关键风险因素,并评估不同模型在痛风预测中的性能。
本研究分析了台湾生物银行中88210名个体的数据,确定了19338例痛风病例和68872例对照。在对性别和年龄进行数据清理和倾向得分匹配后,最终分析样本包括38676名个体(19338例痛风病例和19338例对照)。使用了五种机器学习模型:贝叶斯网络(BN)、随机森林(RF)、梯度提升(GB)、逻辑回归(LR)和神经网络(NN)。使用拆分数据集(80%训练集和20%测试集)评估预测性能。
进行了变量重要性分析以确定关键变量,尿酸和性别成为最有影响力的风险因素。描述性数据突出了对照组和痛风患者之间的显著差异,男性痛风患病率更高(51.36%对48.64%)。RF和GB在多个指标上均表现出高性能,RF始终实现0.986至0.987的高曲线下面积(AUC),同时具有出色的敏感性(0.945 - 0.947)和特异性(0.998 - 0.999)。GB也表现强劲,AUC值约为0.987 - 0.988,并且在不同模型变体中保持高敏感性(0.944 - 0.950)和特异性(0.995 - 0.999)。两种模型(GB和RF)的F1分数表明具有强大的预测能力,值约为0.971 - 0.972。
RF和GB在预测痛风状态方面表现出卓越的准确性,特别是在将遗传数据与临床因素相结合时。这些发现强调了将机器学习模型与遗传信息相结合以提高临床实践中痛风预测准确性的潜力。