Suppr超能文献

使用聚类算法分析2019年俄亥俄州梅毒病例的疾病干预专家记录。

Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.

作者信息

Chakraborty Payal, Ning Xia, McNeill Mary, Kline David M, Shoben Abigail B, Miller William C, Norris Turner Abigail

机构信息

Ohio Department of Health, Columbus, OH.

Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC.

出版信息

Sex Transm Dis. 2025 Mar 1;52(3):146-153. doi: 10.1097/OLQ.0000000000002091. Epub 2024 Oct 31.

Abstract

BACKGROUND

Developments in natural language processing and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.

METHODS

The 2019 disease intervention specialist syphilis records (n = 1996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.

RESULTS

The cluster analysis yielded 6 clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well formed. The factors underlying 3 of the clusters related to patterns of missing data. The factors underlying the other 3 clusters were sexual behaviors and partnerships. Notably, 1 of the 3 consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one comprised mainly of males who have sex with females.

CONCLUSIONS

Our analysis resulted in clusters that were well formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.

摘要

背景

自然语言处理和无监督机器学习方法(如聚类)的发展为研究人员提供了新工具,可用于分析结构化和非结构化健康数据。我们将这些方法应用于2019年俄亥俄州疾病干预专家(DIS)的梅毒记录,以确定这些方法能否揭示梅毒个体特征、风险因素和临床特征共现的新模式,而这些模式尚未在文献中报道。

方法

2019年疾病干预专家梅毒记录(n = 1996)包含结构化数据(分类和数值变量)和非结构化笔记。在结构化数据中,我们检查了病例人口统计学、梅毒风险因素和梅毒临床特征。对于非结构化文本,我们应用了TF-IDF(词频乘以逆文档频率)权重,这是将文本转换为数值表示的常用方法。我们使用CLUTO软件进行了基于余弦相似度的凝聚聚类。

结果

聚类分析根据结构化和非结构化数据中的模式产生了6个梅毒病例聚类。平均内部相似度远高于平均外部相似度,表明聚类形成良好。其中3个聚类的潜在因素与缺失数据模式有关。其他3个聚类的潜在因素是性行为和性伴侣关系。值得注意的是,其中1个聚类由报告在醉酒时与男性或匿名伴侣进行口交的个体组成,另一个主要由与女性发生性行为的男性组成。

结论

我们的分析得出了在数学上形成良好的聚类,但没有揭示关于梅毒风险因素或传播的新的流行病学信息,这些信息此前已经为人所知。

相似文献

1
Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.
Sex Transm Dis. 2025 Mar 1;52(3):146-153. doi: 10.1097/OLQ.0000000000002091. Epub 2024 Oct 31.
2
Using Natural Language Processing Methods to Predict Topics Included in 2019 Ohio Syphilis Disease Intervention Specialist Records.
Sex Transm Dis. 2025 Jun 1;52(6):356-363. doi: 10.1097/OLQ.0000000000002135. Epub 2025 Feb 11.
5
Evaluation of clustering and topic modeling methods over health-related tweets and emails.
Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.
8
Clinical fracture risk evaluated by hierarchical agglomerative clustering.
Osteoporos Int. 2017 Mar;28(3):819-832. doi: 10.1007/s00198-016-3828-8. Epub 2016 Nov 16.
9
The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification.
J Am Geriatr Soc. 2018 Aug;66(8):1499-1507. doi: 10.1111/jgs.15411. Epub 2018 Jul 4.

本文引用的文献

1
Using Natural Language Processing Methods to Predict Topics Included in 2019 Ohio Syphilis Disease Intervention Specialist Records.
Sex Transm Dis. 2025 Jun 1;52(6):356-363. doi: 10.1097/OLQ.0000000000002135. Epub 2025 Feb 11.
3
Partner Elicitation After Partner Services Interviews and Reinterviews Among Patients With Antimicrobial-Resistant Gonorrhea.
Sex Transm Dis. 2021 Dec 1;48(12S Suppl 2):S137-S143. doi: 10.1097/OLQ.0000000000001531.
4
Effect of syphilis infection on HIV acquisition: a systematic review and meta-analysis.
Sex Transm Infect. 2021 Nov;97(7):525-533. doi: 10.1136/sextrans-2020-054706. Epub 2020 Nov 20.
5
The Emerging Intersection Between Injection Drug Use and Early Syphilis in Nonurban Areas of Missouri, 2012-2018.
J Infect Dis. 2020 Sep 2;222(Suppl 5):S465-S470. doi: 10.1093/infdis/jiaa056.
6
The Modern Epidemic of Syphilis.
N Engl J Med. 2020 Feb 27;382(9):845-854. doi: 10.1056/NEJMra1901593.
7
Resurgence of Syphilis in the United States: An Assessment of Contributing Factors.
Infect Dis (Auckl). 2019 Oct 16;12:1178633719883282. doi: 10.1177/1178633719883282. eCollection 2019.
8
Syphilis in the United States: on the rise?
Expert Rev Anti Infect Ther. 2015 Feb;13(2):161-8. doi: 10.1586/14787210.2015.990384. Epub 2014 Dec 9.
9
Syphilis and HIV infection: an update.
Clin Infect Dis. 2007 May 1;44(9):1222-8. doi: 10.1086/513427. Epub 2007 Mar 14.
10
Risk factors for early syphilis among gay and bisexual men seen in an STD clinic: San Francisco, 2002-2003.
Sex Transm Dis. 2005 Jul;32(7):458-63. doi: 10.1097/01.olq.0000168280.34424.58.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验