Suppr超能文献

使用聚类算法分析2019年俄亥俄州梅毒病例的疾病干预专家记录。

Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.

作者信息

Chakraborty Payal, Ning Xia, McNeill Mary, Kline David M, Shoben Abigail B, Miller William C, Norris Turner Abigail

机构信息

Ohio Department of Health, Columbus, OH.

Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC.

出版信息

Sex Transm Dis. 2025 Mar 1;52(3):146-153. doi: 10.1097/OLQ.0000000000002091. Epub 2024 Oct 31.

Abstract

BACKGROUND

Developments in natural language processing and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.

METHODS

The 2019 disease intervention specialist syphilis records (n = 1996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.

RESULTS

The cluster analysis yielded 6 clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well formed. The factors underlying 3 of the clusters related to patterns of missing data. The factors underlying the other 3 clusters were sexual behaviors and partnerships. Notably, 1 of the 3 consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one comprised mainly of males who have sex with females.

CONCLUSIONS

Our analysis resulted in clusters that were well formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.

摘要

背景

自然语言处理和无监督机器学习方法(如聚类)的发展为研究人员提供了新工具,可用于分析结构化和非结构化健康数据。我们将这些方法应用于2019年俄亥俄州疾病干预专家(DIS)的梅毒记录,以确定这些方法能否揭示梅毒个体特征、风险因素和临床特征共现的新模式,而这些模式尚未在文献中报道。

方法

2019年疾病干预专家梅毒记录(n = 1996)包含结构化数据(分类和数值变量)和非结构化笔记。在结构化数据中,我们检查了病例人口统计学、梅毒风险因素和梅毒临床特征。对于非结构化文本,我们应用了TF-IDF(词频乘以逆文档频率)权重,这是将文本转换为数值表示的常用方法。我们使用CLUTO软件进行了基于余弦相似度的凝聚聚类。

结果

聚类分析根据结构化和非结构化数据中的模式产生了6个梅毒病例聚类。平均内部相似度远高于平均外部相似度,表明聚类形成良好。其中3个聚类的潜在因素与缺失数据模式有关。其他3个聚类的潜在因素是性行为和性伴侣关系。值得注意的是,其中1个聚类由报告在醉酒时与男性或匿名伴侣进行口交的个体组成,另一个主要由与女性发生性行为的男性组成。

结论

我们的分析得出了在数学上形成良好的聚类,但没有揭示关于梅毒风险因素或传播的新的流行病学信息,这些信息此前已经为人所知。

相似文献

8
Clinical fracture risk evaluated by hierarchical agglomerative clustering.通过层次凝聚聚类评估临床骨折风险。
Osteoporos Int. 2017 Mar;28(3):819-832. doi: 10.1007/s00198-016-3828-8. Epub 2016 Nov 16.

本文引用的文献

6
The Modern Epidemic of Syphilis.梅毒的现代流行情况
N Engl J Med. 2020 Feb 27;382(9):845-854. doi: 10.1056/NEJMra1901593.
7
Resurgence of Syphilis in the United States: An Assessment of Contributing Factors.美国梅毒的再度流行:促成因素评估
Infect Dis (Auckl). 2019 Oct 16;12:1178633719883282. doi: 10.1177/1178633719883282. eCollection 2019.
8
Syphilis in the United States: on the rise?美国的梅毒:呈上升趋势?
Expert Rev Anti Infect Ther. 2015 Feb;13(2):161-8. doi: 10.1586/14787210.2015.990384. Epub 2014 Dec 9.
9
Syphilis and HIV infection: an update.梅毒与艾滋病毒感染:最新情况
Clin Infect Dis. 2007 May 1;44(9):1222-8. doi: 10.1086/513427. Epub 2007 Mar 14.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验