• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

中国乌鲁木齐学龄前儿童变应性鼻炎预测的可解释机器学习。

Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.

机构信息

Department of Clinical Medicine, Xinjiang Medical University, Urumqi, 830017, China.

Department of Geriatric integrative, Second Affiliated Hospital of Xinjiang Medical University, NO.38, South Lake East Road North Second Lane, Shuimogou District, Urumqi, 830063, Xinjiang, China.

出版信息

Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w.

DOI:10.1038/s41598-024-73733-w
PMID:39333659
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11437280/
Abstract

This study aimed to investigate the advantages and applications of machine learning models in predicting the risk of allergic rhinitis (AR) in children aged 2-8, compared to traditional logistic regression. The study analyzed questionnaire data from 7131 children aged 2-8, which was randomly divided into training, validation, and testing sets in a ratio of 55:15:30, repeated 100 times. Predictor variables included parental allergy, medical history during the child's first year (cfy), and early life environmental factors. The time of first onset of AR was restricted to after the age of 1 year to establish a clear temporal relationship between the predictor variables and the outcome. Feature engineering utilized the chi-square test and the Boruta algorithm, refining the dataset for analysis. The construction utilized Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting Tree (XGBoost) as the models. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), and the optimal decision threshold was determined by weighing multiple metrics on the validation sets and reporting results on the testing set. Additionally, the strengths and limitations of the different models were comprehensively analyzed by stratifying gender, mode of birth, and age subgroups, as well as by varying the number of predictor variables. Furthermore, methods such as Shapley additive explanations (SHAP) and purity of node partition in Random Forest were employed to assess feature importance, along with exploring model stability through alterations in the number of features. In this study, 7131 children aged 2-8 were analyzed, with 524 (7.35%) diagnosed with AR, with an onset age ranging from 2 to 8 years. Optimal parameters were refined using the validation set, and a rigorous process of 100 random divisions and repeated training ensured robust evaluation of the models on the testing set. The model construction involved incorporating fourteen variables, including the history of allergy-related diseases during the child's first year, familial genetic factors, and early-life indoor environmental factors. The performance of LR, SVM, RF, and XGBoost on the unstratified data test set was 0.715 (standard deviation = 0.023), 0.723 (0.022), 0.747 (0.015), and 0.733 (0.019), respectively; the performance of each model was stable on the stratified data, and the RF performance was significantly better than that of LR (paired samples t-test: p < 0.001). Different techniques for evaluating the importance of features showed that the top5 variables were father or mother with AR, having older siblings, history of food allergy and father's educational level. Utilizing strategies like stratification and adjusting the number of features, this study constructed a random forest model that outperforms traditional logistic regression. Specifically designed to detect the occurrence of allergic rhinitis (AR) in children aged 2-8, the model incorporates parental allergic history and early life environmental factors. The selection of the optimal cut-off value was determined through a comprehensive evaluation strategy. Additionally, we identified the top 5 crucial features that greatly influence the model's performance. This study serves as a valuable reference for implementing machine learning-based AR prediction in pediatric populations.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/f98304730e21/41598_2024_73733_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/026651b81cb0/41598_2024_73733_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/df5529f2350e/41598_2024_73733_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/c08d62d16e35/41598_2024_73733_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/c5b72e34860f/41598_2024_73733_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/f98304730e21/41598_2024_73733_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/026651b81cb0/41598_2024_73733_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/df5529f2350e/41598_2024_73733_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/c08d62d16e35/41598_2024_73733_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/c5b72e34860f/41598_2024_73733_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfc5/11437280/f98304730e21/41598_2024_73733_Fig5_HTML.jpg
摘要

本研究旨在探讨机器学习模型在预测 2-8 岁儿童变应性鼻炎(AR)风险方面的优势和应用,与传统的逻辑回归相比。研究分析了 7131 名 2-8 岁儿童的问卷调查数据,将其随机分为训练集、验证集和测试集,比例为 55:15:30,并重复 100 次。预测变量包括父母过敏、儿童第一年(cfy)期间的病史和早期生活环境因素。AR 的首次发病时间限制在 1 岁以后,以便在预测变量和结果之间建立明确的时间关系。特征工程利用卡方检验和 Boruta 算法,对数据集进行了精炼分析。构建利用了逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)和极端梯度提升树(XGBoost)作为模型。模型性能使用接收者操作特征曲线下的面积(AUROC)进行评估,并通过在验证集上权衡多个指标和报告测试集上的结果来确定最佳决策阈值。此外,还通过对性别、出生方式和年龄亚组以及不同预测变量数量进行分层,以及通过改变特征数量,对不同模型的优缺点进行了全面分析。此外,还采用了 Shapley 加性解释(SHAP)和随机森林中节点分区的纯度等方法来评估特征的重要性,并通过改变特征数量来探索模型的稳定性。在这项研究中,分析了 7131 名 2-8 岁的儿童,其中 524 名(7.35%)被诊断为 AR,发病年龄为 2 至 8 岁。通过验证集对最优参数进行了细化,并通过 100 次随机划分和重复训练的严格过程,确保了模型在测试集上的稳健评估。模型构建包括纳入了 14 个变量,包括儿童第一年与过敏相关疾病的病史、家族遗传因素和早期生活室内环境因素。LR、SVM、RF 和 XGBoost 在未分层数据测试集上的性能分别为 0.715(标准偏差=0.023)、0.723(0.022)、0.747(0.015)和 0.733(0.019);每个模型在分层数据上的性能都很稳定,RF 性能明显优于 LR(配对样本 t 检验:p<0.001)。不同的特征重要性评估技术表明,前 5 个变量是父母中有 AR、有兄弟姐妹、有食物过敏史和父亲的教育程度。通过分层和调整特征数量等策略,本研究构建了一个随机森林模型,该模型优于传统的逻辑回归。该模型专门设计用于检测 2-8 岁儿童变应性鼻炎(AR)的发生,纳入了父母过敏史和早期生活环境因素。通过综合评价策略确定了最佳截断值。此外,我们还确定了对模型性能影响最大的前 5 个关键特征。本研究为在儿科人群中实施基于机器学习的 AR 预测提供了有价值的参考。

相似文献

1
Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.中国乌鲁木齐学龄前儿童变应性鼻炎预测的可解释机器学习。
Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w.
2
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
3
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].基于监督机器学习算法构建脓毒症休克患者死亡风险预测模型
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.
4
Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study.通过可解释的机器学习算法对急性缺血性脑卒中进行预测病因分类:一项多中心前瞻性队列研究。
BMC Med Res Methodol. 2024 Sep 10;24(1):199. doi: 10.1186/s12874-024-02331-1.
5
Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study.中国两个中心用于预测非心脏手术后心肌损伤的可解释机器学习模型的开发与验证:一项回顾性研究
JMIR Aging. 2024 Jul 26;7:e54872. doi: 10.2196/54872.
6
Development and validation of a machine learning-based predictive model for assessing the 90-day prognostic outcome of patients with spontaneous intracerebral hemorrhage.基于机器学习的预测模型评估自发性脑出血患者 90 天预后结局的开发与验证。
J Transl Med. 2024 Mar 4;22(1):236. doi: 10.1186/s12967-024-04896-3.
7
A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型:机器学习研究。
J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.
8
Prediction of sepsis mortality in ICU patients using machine learning methods.使用机器学习方法预测 ICU 患者的败血症死亡率。
BMC Med Inform Decis Mak. 2024 Aug 16;24(1):228. doi: 10.1186/s12911-024-02630-z.
9
Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。
Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.
10
Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.基于机器学习和 Shapley 加法解释的 2 型糖尿病患者外周血管疾病预测模型和风险分析。
Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.

引用本文的文献

1
Risk factors for allergic rhinitis in preschool children: a meta-analysis and systematic review.学龄前儿童过敏性鼻炎的危险因素:一项荟萃分析与系统评价
BMC Pediatr. 2025 Aug 7;25(1):611. doi: 10.1186/s12887-025-05906-z.
2
Artificial intelligence in pediatric allergy research.人工智能在儿科过敏研究中的应用
Eur J Pediatr. 2024 Dec 21;184(1):98. doi: 10.1007/s00431-024-05925-5.

本文引用的文献

1
Intrauterine and early postnatal exposures to submicron particulate matter and childhood allergic rhinitis: A multicity cross-sectional study in China.宫内和产后早期暴露于亚微米颗粒物与儿童变应性鼻炎:中国多城市横断面研究。
Environ Res. 2024 Apr 15;247:118165. doi: 10.1016/j.envres.2024.118165. Epub 2024 Jan 11.
2
Impact of socioeconomic factors on allergic diseases.社会经济因素对过敏性疾病的影响。
J Allergy Clin Immunol. 2024 Feb;153(2):368-377. doi: 10.1016/j.jaci.2023.10.025. Epub 2023 Nov 14.
3
Air Pollution and Allergic Rhinitis: Findings from a Prospective Cohort Study.
空气污染与变应性鼻炎:一项前瞻性队列研究的结果。
Environ Sci Technol. 2023 Oct 24;57(42):15835-15845. doi: 10.1021/acs.est.3c04527. Epub 2023 Oct 13.
4
Prevalence and influencing factors of wheeze and asthma among preschool children in Urumqi city: a cross-sectional survey.乌鲁木齐市学龄前儿童喘息和哮喘的患病率及影响因素:一项横断面调查。
Sci Rep. 2023 Feb 8;13(1):2263. doi: 10.1038/s41598-023-29121-x.
5
Environmental Exposures may Hold the Key; Impact of Air Pollution, Greenness, and Rural/Farm Lifestyle on Allergic Outcomes.环境暴露可能是关键;空气污染、绿化和农村/农场生活方式对过敏结果的影响。
Curr Allergy Asthma Rep. 2023 Feb;23(2):77-91. doi: 10.1007/s11882-022-01061-y. Epub 2023 Jan 7.
6
Association between prenatal or postpartum exposure to tobacco smoking and allergic rhinitis in the offspring: An updated meta-analysis of nine cohort studies.产前或产后接触吸烟与后代过敏性鼻炎之间的关联:九项队列研究的最新荟萃分析
Tob Induc Dis. 2022 Apr 11;20:37. doi: 10.18332/tid/146905. eCollection 2022.
7
Antibiotics in critically ill children-a narrative review on different aspects of a rational approach.危重症儿童的抗生素使用:合理应用抗生素的不同方面的叙述性综述。
Pediatr Res. 2022 Jan;91(2):440-446. doi: 10.1038/s41390-021-01878-9. Epub 2021 Dec 6.
8
Predicting Environmental Allergies from Real World Data Through a Mobile Study Platform.通过移动研究平台从现实世界数据预测环境过敏症
J Asthma Allergy. 2021 Mar 18;14:259-264. doi: 10.2147/JAA.S292336. eCollection 2021.
9
Association of secondhand smoke exposure with allergic multimorbidity in Korean adolescents.二手烟暴露与韩国青少年过敏多种疾病的关联。
Sci Rep. 2020 Oct 2;10(1):16409. doi: 10.1038/s41598-020-73430-4.
10
Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values.使用局部逼近和 Shapley 值解释复杂机器学习模型的复合活动预测。
J Med Chem. 2020 Aug 27;63(16):8761-8777. doi: 10.1021/acs.jmedchem.9b01101. Epub 2019 Sep 26.