Suppr超能文献

超越性传播感染诊所:利用行政索赔数据和机器学习开发并验证淋病患者层面的预测模型。

Beyond the STI clinic: Use of administrative claims data and machine learning to develop and validate patient-level prediction models for gonorrhea.

作者信息

Argante Lorenzo, Lonnet Germain, Aris Emmanuel, Whelan Jane

机构信息

Clinical Statistics, GSK, Siena, Italy.

Real-World Analytics, GSK, Wavre, Belgium.

出版信息

Digit Health. 2025 Apr 3;11:20552076251331895. doi: 10.1177/20552076251331895. eCollection 2025 Jan-Dec.

Abstract

BACKGROUND

Gonorrhea is a sexually transmitted infection (STI) that, untreated, can result in debilitating complications such as pelvic inflammatory disease, pain, and infertility. A minority of cases are diagnosed in STI clinics in the United States. Gonorrhea is often asymptomatic and presumed to be substantially underdiagnosed and/or undertreated.

OBJECTIVES

To generate and compare predictive machine learning (ML) models using administrative claims data to characterize young women in the general United States population who would be most likely to contract gonorrhea.

METHODS

Data were extracted from the Merative™ MarketScan Commercial and Medicaid databases containing routinely collected administrative claims data. Women aged 16-35 years with two years of continuous observation between 1 January 2017 and 31 December 2018 were included. ML classification models were constructed based on logistic regression and tree-based algorithms.

RESULTS

Models constructed using tree-based algorithms such as XGBoost provided the best discriminatory results, but simpler ridge regressions models with splines also achieved reasonable discrimination, allowing for the identification of population subsets at increased risk of gonorrhea infection. A subset of 0.1% of the population identified by the XGBoost model had a 70-fold higher risk of gonorrhea than the general population. External validation applying the different models on a Medicaid dataset that was not included in developing the original models was checked and deemed acceptable.

CONCLUSIONS

The models and methods presented here could facilitate the identification of women at high risk of contracting gonorrhea for whom targeted preventive measures may be most beneficial.

摘要

背景

淋病是一种性传播感染(STI),若不治疗,可能导致诸如盆腔炎、疼痛和不孕等使人衰弱的并发症。在美国,少数淋病病例是在性传播感染诊所被诊断出来的。淋病通常没有症状,据推测在很大程度上存在诊断不足和/或治疗不足的情况。

目的

利用行政索赔数据生成并比较预测性机器学习(ML)模型,以描述美国普通人群中最有可能感染淋病的年轻女性特征。

方法

数据从包含常规收集的行政索赔数据的麦利云™市场扫描商业数据库和医疗补助数据库中提取。纳入了在2017年1月1日至2018年12月31日期间有两年连续观察记录的16 - 35岁女性。基于逻辑回归和基于树的算法构建了ML分类模型。

结果

使用诸如XGBoost等基于树的算法构建的模型提供了最佳的区分结果,但带有样条的更简单的岭回归模型也实现了合理的区分,从而能够识别出淋病感染风险增加的人群子集。XGBoost模型确定的占人口0.1%的一个子集感染淋病的风险比普通人群高70倍。在未包含在原始模型开发中的医疗补助数据集上应用不同模型进行的外部验证经过检查并被认为是可接受的。

结论

本文提出的模型和方法有助于识别出最有可能感染淋病的女性,针对这些女性采取有针对性的预防措施可能最为有益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5e4/11970062/e5323f1d610c/10.1177_20552076251331895-fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验