Suppr超能文献

基于XGBoost的机器学习模型结合临床和超声数据用于甲状腺结节恶性肿瘤的个性化预测

XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy.

作者信息

Li Wenhan, Zhou Yajing, Luo Ziyu, Tan Miao, Yin Rui, Li Jianhui

机构信息

Department of Surgical Oncology, Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, China.

The Third Affiliated Hospital, School of Medicine, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

出版信息

Front Endocrinol (Lausanne). 2025 Jul 29;16:1639639. doi: 10.3389/fendo.2025.1639639. eCollection 2025.

Abstract

PURPOSE

Thyroid ultrasound is a primary tool for screening thyroid nodules (TNs), but existing risk stratification systems have limitations. Nowadays, machine learning (ML) offers advanced capabilities to handle high-dimensional data and complex patterns. This study aimed to develop an ML model integrating clinical data and ultrasound features to improve personalized prediction of TN malignancy.

METHODS

Data from 2,014 patients with TNs (2018.01-2024.01) were retrospectively analyzed, with 1,612 in the training set and 402 in the test set. Features included demographic, ultrasound, and thyroid function indices. Random Forest (RF) and Lasso regression were used for feature selection. Furthermore, six ML models (KNN, Logistic Regression, RF, Classification Tree, SVM, and XGBoost) were developed and validated via 10-fold cross-validation, evaluating performance using area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, calibration curves, and decision curve analysis (DCA).

RESULTS

17 variables were influential factors for diagnosing TNs. All six models exhibited satisfactory predictive performance, with their accuracy ranging from 0.761 to 0.851 and AUC from 0.755 to 0.928. Among them, the XGBoost model demonstrated the best performance, achieving an AUC of 0.928, accuracy of 0.851, sensitivity of 0.933, and specificity of 0.650. Calibration curves showed strong agreement between predicted and observed malignancy probabilities, and DCA indicated net clinical benefit across a wide risk threshold range (0.2-0.9). Additionally, we have developed the model as a web-based calculator to facilitate its practical application.

CONCLUSIONS

The XGBoost model effectively integrates multi-modal data to predict TN malignancy, offering improved accuracy and clinical utility.

摘要

目的

甲状腺超声是筛查甲状腺结节(TNs)的主要工具,但现有的风险分层系统存在局限性。如今,机器学习(ML)提供了处理高维数据和复杂模式的先进能力。本研究旨在开发一种整合临床数据和超声特征的ML模型,以改善对TN恶性肿瘤的个性化预测。

方法

回顾性分析2014例TNs患者(2018.01 - 2024.01)的数据,其中训练集1612例,测试集402例。特征包括人口统计学、超声和甲状腺功能指标。使用随机森林(RF)和套索回归进行特征选择。此外,开发了六种ML模型(KNN、逻辑回归、RF、分类树、支持向量机和XGBoost)并通过10折交叉验证进行验证,使用受试者操作特征曲线下面积(AUC)、准确性、敏感性、特异性、校准曲线和决策曲线分析(DCA)评估性能。

结果

17个变量是诊断TNs的影响因素。所有六种模型均表现出令人满意的预测性能,其准确性范围为0.761至0.851,AUC范围为0.755至0.928。其中,XGBoost模型表现最佳,AUC为0.928,准确性为0.851,敏感性为0.933,特异性为0.650。校准曲线显示预测和观察到的恶性概率之间有很强的一致性,DCA表明在广泛的风险阈值范围(0.2 - 0.9)内有净临床益处。此外,我们已将该模型开发为基于网络的计算器,以促进其实际应用。

结论

XGBoost模型有效地整合多模态数据以预测TN恶性肿瘤,提高了准确性和临床实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3035/12339320/c51d694e30cd/fendo-16-1639639-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验