Suppr超能文献

利用自然语言处理技术从口头尸检叙述中提取死因和最常见疾病的文本挖掘。

Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.

机构信息

Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa.

MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), Johannesburg, South Africa.

出版信息

PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.

Abstract

Verbal autopsy (VA) narratives play a crucial role in understanding and documenting the causes of mortality, especially in regions lacking robust medical infrastructure. In this study, we propose a comprehensive approach to extract mortality causes and identify prevalent diseases from VA narratives utilizing advanced text mining techniques, so as to better understand the underlying health issues leading to mortality. Our methodology integrates n-gram-based language processing, Latent Dirichlet Allocation (LDA), and BERTopic, offering a multi-faceted analysis to enhance the accuracy and depth of information extraction. This is a retrospective study that uses secondary data analysis. We used data from the Agincourt Health and Demographic Surveillance Site (HDSS), which had 16338 observations collected between 1993 and 2015. Our text mining steps entailed data acquisition, pre-processing, feature extraction, topic segmentation, and discovered knowledge. The results suggest that the HDSS population may have died from mortality causes such as vomiting, chest/stomach pain, fever, coughing, loss of weight, low energy, headache. Additionally, we discovered that the most prevalent diseases entailed human immunodeficiency virus (HIV), tuberculosis (TB), diarrhoea, cancer, neurological disorders, malaria, diabetes, high blood pressure, chronic ailments (kidney, heart, lung, liver), maternal and accident related deaths. This study is relevant in that it avails valuable insights regarding mortality causes and most prevalent diseases using novel text mining approaches. These results can be integrated in the diagnosis pipeline for ease of human annotation and interpretation. As such, this will help with effective informed intervention programmes that can improve primary health care systems and chronic based delivery, thus increasing life expectancy.

摘要

死因推断(VA)叙述在理解和记录死亡原因方面发挥着至关重要的作用,尤其是在缺乏健全医疗基础设施的地区。在本研究中,我们提出了一种综合方法,利用先进的文本挖掘技术从 VA 叙述中提取死亡原因和识别常见疾病,以便更好地了解导致死亡的潜在健康问题。我们的方法结合了基于 n-gram 的语言处理、潜在狄利克雷分配(LDA)和 BERTopic,提供了多方面的分析,以提高信息提取的准确性和深度。这是一项回顾性研究,使用了二次数据分析。我们使用了来自 Agincourt 健康和人口监测站点(HDSS)的数据,该站点在 1993 年至 2015 年期间收集了 16338 次观察结果。我们的文本挖掘步骤包括数据采集、预处理、特征提取、主题分割和发现知识。结果表明,HDSS 人群可能死于呕吐、胸部/胃痛、发烧、咳嗽、体重减轻、能量低下、头痛等死亡原因。此外,我们发现最常见的疾病包括人类免疫缺陷病毒(HIV)、结核病(TB)、腹泻、癌症、神经紊乱、疟疾、糖尿病、高血压、慢性疾病(肾脏、心脏、肺部、肝脏)、孕产妇和意外相关死亡。这项研究具有相关性,因为它利用了新颖的文本挖掘方法提供了有关死亡原因和最常见疾病的宝贵见解。这些结果可以整合到诊断管道中,便于人工注释和解释。因此,这将有助于实施有效的知情干预计划,改善初级卫生保健系统和基于慢性病的服务提供,从而提高预期寿命。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/1d6d559c3fda/pone.0308452.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验