Suppr超能文献

幽门螺杆菌(H. pylori)危险因素分析与流行预测:基于机器学习的方法。

Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach.

机构信息

Department of Mathematics, Colgate University, 13 Oak Dr., Hamilton, NY, USA.

College of Health Sciences, Department of Medical Laboratory Science, Addis Ababa University, Addis Ababa, Ethiopia.

出版信息

BMC Infect Dis. 2022 Jul 28;22(1):655. doi: 10.1186/s12879-022-07625-7.

Abstract

BACKGROUND

Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques.

OBJECTIVE

We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status.

METHODS

We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms.

RESULTS

The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%-a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression.

CONCLUSION

This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method.

摘要

背景

尽管先前的流行病学研究已经检验了增加感染幽门螺杆菌可能性的潜在风险因素,但这些分析大多使用了传统的统计模型,包括逻辑回归,并且尚未受益于先进的机器学习技术。

目的

我们使用机器学习算法来检验在校儿童中的幽门螺杆菌感染风险因素,以确定重要的风险因素,并确定机器学习是否可用于预测幽门螺杆菌感染状态。

方法

我们将特征选择和分类算法应用于来自埃塞俄比亚基于学校的横断面调查的数据。该数据集包含 954 名在校儿童,具有 27 个社会人口统计学和生活方式变量。我们对数据进行了五次十折交叉验证。我们对每个特征选择(例如信息增益)和分类(例如支持向量机)算法组合的结果进行了组合。

结果

XGBoost 分类器在预测幽门螺杆菌感染状态方面具有最高的准确性,准确性为 77%-比猜测最常见类别的基线准确性(64%的样本为幽门螺杆菌阴性)高 13%。K-最近邻在所有分类器中表现最差。使用 F1 评分和接收器工作特征曲线(AUROC)分类器评估指标观察到类似的性能。在所有特征中,居住地(城市居住增加风险)是幽门螺杆菌感染的最常见风险因素,无论选择哪种特征选择方法。此外,我们的机器学习算法还确定了其他重要的幽门螺杆菌感染风险因素,例如家庭用电,厕所类型和废物处理地点。使用 75%的稳健性截止值,机器学习确定了传统多元逻辑回归发现的 8 个重要特征中的 5 个。但是,当使用较低的稳健性阈值时,机器学习方法比多元逻辑回归确定了更多的幽门螺杆菌风险因素,并提出了逻辑回归未检测到的风险因素。

结论

本研究提供了证据表明,机器学习方法可以揭示幽门螺杆菌感染的风险因素并预测幽门螺杆菌感染状态。这些方法确定了相似的风险因素,并具有与逻辑回归相当的准确性来预测感染,因此可以用作替代方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/692b/9336032/257a6b40e137/12879_2022_7625_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验