• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用聚类算法分析2019年俄亥俄州梅毒病例的疾病干预专家记录。

Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.

作者信息

Chakraborty Payal, Ning Xia, McNeill Mary, Kline David M, Shoben Abigail B, Miller William C, Norris Turner Abigail

机构信息

Ohio Department of Health, Columbus, OH.

Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC.

出版信息

Sex Transm Dis. 2025 Mar 1;52(3):146-153. doi: 10.1097/OLQ.0000000000002091. Epub 2024 Oct 31.

DOI:10.1097/OLQ.0000000000002091
PMID:39481010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12040071/
Abstract

BACKGROUND

Developments in natural language processing and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.

METHODS

The 2019 disease intervention specialist syphilis records (n = 1996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.

RESULTS

The cluster analysis yielded 6 clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well formed. The factors underlying 3 of the clusters related to patterns of missing data. The factors underlying the other 3 clusters were sexual behaviors and partnerships. Notably, 1 of the 3 consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one comprised mainly of males who have sex with females.

CONCLUSIONS

Our analysis resulted in clusters that were well formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.

摘要

背景

自然语言处理和无监督机器学习方法(如聚类)的发展为研究人员提供了新工具,可用于分析结构化和非结构化健康数据。我们将这些方法应用于2019年俄亥俄州疾病干预专家(DIS)的梅毒记录,以确定这些方法能否揭示梅毒个体特征、风险因素和临床特征共现的新模式,而这些模式尚未在文献中报道。

方法

2019年疾病干预专家梅毒记录(n = 1996)包含结构化数据(分类和数值变量)和非结构化笔记。在结构化数据中,我们检查了病例人口统计学、梅毒风险因素和梅毒临床特征。对于非结构化文本,我们应用了TF-IDF(词频乘以逆文档频率)权重,这是将文本转换为数值表示的常用方法。我们使用CLUTO软件进行了基于余弦相似度的凝聚聚类。

结果

聚类分析根据结构化和非结构化数据中的模式产生了6个梅毒病例聚类。平均内部相似度远高于平均外部相似度,表明聚类形成良好。其中3个聚类的潜在因素与缺失数据模式有关。其他3个聚类的潜在因素是性行为和性伴侣关系。值得注意的是,其中1个聚类由报告在醉酒时与男性或匿名伴侣进行口交的个体组成,另一个主要由与女性发生性行为的男性组成。

结论

我们的分析得出了在数学上形成良好的聚类,但没有揭示关于梅毒风险因素或传播的新的流行病学信息,这些信息此前已经为人所知。

相似文献

1
Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.使用聚类算法分析2019年俄亥俄州梅毒病例的疾病干预专家记录。
Sex Transm Dis. 2025 Mar 1;52(3):146-153. doi: 10.1097/OLQ.0000000000002091. Epub 2024 Oct 31.
2
Using Natural Language Processing Methods to Predict Topics Included in 2019 Ohio Syphilis Disease Intervention Specialist Records.使用自然语言处理方法预测2019年俄亥俄州梅毒疾病干预专家记录中包含的主题。
Sex Transm Dis. 2025 Jun 1;52(6):356-363. doi: 10.1097/OLQ.0000000000002135. Epub 2025 Feb 11.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Outbreaks of syphilis among men who have sex with men attending STI clinics between 2007 and 2015 in the Netherlands: a space-time clustering study.2007年至2015年荷兰性传播感染诊所中男男性行为者的梅毒暴发:一项时空聚集性研究
Sex Transm Infect. 2017 Sep;93(6):390-395. doi: 10.1136/sextrans-2016-052754. Epub 2016 Dec 16.
5
Evaluation of clustering and topic modeling methods over health-related tweets and emails.健康相关推文和电子邮件的聚类和主题建模方法评估。
Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.
6
Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study.使用机器学习识别肥胖人群中的聚类:马斯特里赫特研究的二次分析
JMIR Med Inform. 2025 Feb 5;13:e64479. doi: 10.2196/64479.
7
Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.将自然语言处理和机器学习算法集成到放射学报告中的肿瘤反应分类中。
J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.
8
Clinical fracture risk evaluated by hierarchical agglomerative clustering.通过层次凝聚聚类评估临床骨折风险。
Osteoporos Int. 2017 Mar;28(3):819-832. doi: 10.1007/s00198-016-3828-8. Epub 2016 Nov 16.
9
The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification.非结构化电子健康记录数据在老年综合征病例识别中的价值。
J Am Geriatr Soc. 2018 Aug;66(8):1499-1507. doi: 10.1111/jgs.15411. Epub 2018 Jul 4.
10
Sexual network characteristics of men who have sex with men with syphilis and/or gonorrhoea/chlamydia in Lima, Peru: network patterns as roadmaps for STI prevention interventions.秘鲁利马梅毒和/或淋病/衣原体感染的男男性行为者的性网络特征:网络模式作为性传播感染干预措施的路线图。
Sex Transm Infect. 2019 Aug;95(5):336-341. doi: 10.1136/sextrans-2018-053865. Epub 2019 Apr 22.

本文引用的文献

1
Using Natural Language Processing Methods to Predict Topics Included in 2019 Ohio Syphilis Disease Intervention Specialist Records.使用自然语言处理方法预测2019年俄亥俄州梅毒疾病干预专家记录中包含的主题。
Sex Transm Dis. 2025 Jun 1;52(6):356-363. doi: 10.1097/OLQ.0000000000002135. Epub 2025 Feb 11.
2
High HIV diversity, recombination, and superinfection revealed in a large outbreak among persons who inject drugs in Kentucky and Ohio, USA.在美国肯塔基州和俄亥俄州注射毒品者群体中的一次大规模疫情中发现了高HIV多样性、重组和重复感染情况。
Virus Evol. 2024 Feb 19;10(1):veae015. doi: 10.1093/ve/veae015. eCollection 2024.
3
Partner Elicitation After Partner Services Interviews and Reinterviews Among Patients With Antimicrobial-Resistant Gonorrhea.
耐抗生素淋病患者伴侣服务访谈及再次访谈后的伴侣引出情况
Sex Transm Dis. 2021 Dec 1;48(12S Suppl 2):S137-S143. doi: 10.1097/OLQ.0000000000001531.
4
Effect of syphilis infection on HIV acquisition: a systematic review and meta-analysis.梅毒感染对 HIV 获得的影响:系统评价和荟萃分析。
Sex Transm Infect. 2021 Nov;97(7):525-533. doi: 10.1136/sextrans-2020-054706. Epub 2020 Nov 20.
5
The Emerging Intersection Between Injection Drug Use and Early Syphilis in Nonurban Areas of Missouri, 2012-2018.2012-2018 年,密苏里州非城市地区注射吸毒与早期梅毒之间的新兴交集。
J Infect Dis. 2020 Sep 2;222(Suppl 5):S465-S470. doi: 10.1093/infdis/jiaa056.
6
The Modern Epidemic of Syphilis.梅毒的现代流行情况
N Engl J Med. 2020 Feb 27;382(9):845-854. doi: 10.1056/NEJMra1901593.
7
Resurgence of Syphilis in the United States: An Assessment of Contributing Factors.美国梅毒的再度流行:促成因素评估
Infect Dis (Auckl). 2019 Oct 16;12:1178633719883282. doi: 10.1177/1178633719883282. eCollection 2019.
8
Syphilis in the United States: on the rise?美国的梅毒:呈上升趋势?
Expert Rev Anti Infect Ther. 2015 Feb;13(2):161-8. doi: 10.1586/14787210.2015.990384. Epub 2014 Dec 9.
9
Syphilis and HIV infection: an update.梅毒与艾滋病毒感染:最新情况
Clin Infect Dis. 2007 May 1;44(9):1222-8. doi: 10.1086/513427. Epub 2007 Mar 14.
10
Risk factors for early syphilis among gay and bisexual men seen in an STD clinic: San Francisco, 2002-2003.2002 - 2003年旧金山一家性传播疾病诊所中男同性恋和双性恋男性早期梅毒的风险因素
Sex Transm Dis. 2005 Jul;32(7):458-63. doi: 10.1097/01.olq.0000168280.34424.58.