Anik Adib Ahmed, Upama Paramita Basak, Rabbani Masud, Tian Shiyu, Park Min Sook, Ahamed Sheikh Iqbal, Luo Jake, Oh Hyunkyoung
Ubicomp Lab, Department of Computer Science, Marquette University, Milwaukee, WI, USA.
School of Information, College of Communication and Information, Florida State University, FL, USA.
Proc COMPSAC. 2024 Jul;2024:862-869. doi: 10.1109/compsac61105.2024.00119. Epub 2024 Aug 26.
This study suggests a way to utilize the existing medical ontology and natural language processing techniques to extract major medical concepts from lay vocabularies of health consumers on social media and group them based on the defined semantic types in the ontology. Diabetes-related discussions on Tumblr was used to test the efficiency of SpaCy and the Markov-Viterbi algorithm to map lay medical terms to the defined medical concepts in the UMLS. The system discussed in this paper can better analyze free texts, take care of word ambiguity and extract the lifestyle indicators from the daily life discussions of diabetic people on Tumblr. The findings of this study can contribute to developing health applications that track the health behavior of those living with chronic conditions such as diabetes. This approach can also assist researchers who are interested in processing lay languages used by health consumers to foster an understanding of their health behavior.
本研究提出了一种利用现有医学本体和自然语言处理技术从社交媒体上健康消费者的日常词汇中提取主要医学概念,并根据本体中定义的语义类型对其进行分组的方法。以Tumblr上与糖尿病相关的讨论为例,测试了SpaCy和马尔可夫-维特比算法将日常医学术语映射到统一医学语言系统(UMLS)中定义的医学概念的效率。本文所讨论的系统能够更好地分析自由文本,处理词义模糊问题,并从Tumblr上糖尿病患者的日常生活讨论中提取生活方式指标。本研究结果有助于开发健康应用程序,以跟踪糖尿病等慢性病患者的健康行为。这种方法还可以帮助有兴趣处理健康消费者使用的日常语言的研究人员,加深对他们健康行为的理解。