• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自然语言处理技术从口头尸检叙述中提取死因和最常见疾病的文本挖掘。

Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.

机构信息

Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa.

MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), Johannesburg, South Africa.

出版信息

PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.

DOI:10.1371/journal.pone.0308452
PMID:39298425
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11412533/
Abstract

Verbal autopsy (VA) narratives play a crucial role in understanding and documenting the causes of mortality, especially in regions lacking robust medical infrastructure. In this study, we propose a comprehensive approach to extract mortality causes and identify prevalent diseases from VA narratives utilizing advanced text mining techniques, so as to better understand the underlying health issues leading to mortality. Our methodology integrates n-gram-based language processing, Latent Dirichlet Allocation (LDA), and BERTopic, offering a multi-faceted analysis to enhance the accuracy and depth of information extraction. This is a retrospective study that uses secondary data analysis. We used data from the Agincourt Health and Demographic Surveillance Site (HDSS), which had 16338 observations collected between 1993 and 2015. Our text mining steps entailed data acquisition, pre-processing, feature extraction, topic segmentation, and discovered knowledge. The results suggest that the HDSS population may have died from mortality causes such as vomiting, chest/stomach pain, fever, coughing, loss of weight, low energy, headache. Additionally, we discovered that the most prevalent diseases entailed human immunodeficiency virus (HIV), tuberculosis (TB), diarrhoea, cancer, neurological disorders, malaria, diabetes, high blood pressure, chronic ailments (kidney, heart, lung, liver), maternal and accident related deaths. This study is relevant in that it avails valuable insights regarding mortality causes and most prevalent diseases using novel text mining approaches. These results can be integrated in the diagnosis pipeline for ease of human annotation and interpretation. As such, this will help with effective informed intervention programmes that can improve primary health care systems and chronic based delivery, thus increasing life expectancy.

摘要

死因推断(VA)叙述在理解和记录死亡原因方面发挥着至关重要的作用,尤其是在缺乏健全医疗基础设施的地区。在本研究中,我们提出了一种综合方法,利用先进的文本挖掘技术从 VA 叙述中提取死亡原因和识别常见疾病,以便更好地了解导致死亡的潜在健康问题。我们的方法结合了基于 n-gram 的语言处理、潜在狄利克雷分配(LDA)和 BERTopic,提供了多方面的分析,以提高信息提取的准确性和深度。这是一项回顾性研究,使用了二次数据分析。我们使用了来自 Agincourt 健康和人口监测站点(HDSS)的数据,该站点在 1993 年至 2015 年期间收集了 16338 次观察结果。我们的文本挖掘步骤包括数据采集、预处理、特征提取、主题分割和发现知识。结果表明,HDSS 人群可能死于呕吐、胸部/胃痛、发烧、咳嗽、体重减轻、能量低下、头痛等死亡原因。此外,我们发现最常见的疾病包括人类免疫缺陷病毒(HIV)、结核病(TB)、腹泻、癌症、神经紊乱、疟疾、糖尿病、高血压、慢性疾病(肾脏、心脏、肺部、肝脏)、孕产妇和意外相关死亡。这项研究具有相关性,因为它利用了新颖的文本挖掘方法提供了有关死亡原因和最常见疾病的宝贵见解。这些结果可以整合到诊断管道中,便于人工注释和解释。因此,这将有助于实施有效的知情干预计划,改善初级卫生保健系统和基于慢性病的服务提供,从而提高预期寿命。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/7676691dcf59/pone.0308452.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/1d6d559c3fda/pone.0308452.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/18abc7b09651/pone.0308452.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/e676f3a97680/pone.0308452.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/88ccdc5cc70e/pone.0308452.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/6bac8349e026/pone.0308452.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/8b380c331bbf/pone.0308452.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/4d16106afa5e/pone.0308452.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/e8ca1ea35fb4/pone.0308452.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/c18065403e0a/pone.0308452.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/7676691dcf59/pone.0308452.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/1d6d559c3fda/pone.0308452.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/18abc7b09651/pone.0308452.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/e676f3a97680/pone.0308452.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/88ccdc5cc70e/pone.0308452.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/6bac8349e026/pone.0308452.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/8b380c331bbf/pone.0308452.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/4d16106afa5e/pone.0308452.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/e8ca1ea35fb4/pone.0308452.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/c18065403e0a/pone.0308452.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8688/11412533/7676691dcf59/pone.0308452.g010.jpg

相似文献

1
Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.利用自然语言处理技术从口头尸检叙述中提取死因和最常见疾病的文本挖掘。
PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.
2
Trend and causes of adult mortality in Kersa health and demographic surveillance system (Kersa HDSS), eastern Ethiopia: verbal autopsy method.埃塞俄比亚东部克萨卫生与人口监测系统(Kersa HDSS)中成人死亡率的趋势及原因:死因推断方法
Popul Health Metr. 2017 Jul 1;15(1):22. doi: 10.1186/s12963-017-0144-2.
3
Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa.机器学习和计算机编码语言尸检(CCVA)算法在死因判定中的性能评估:来自南非农村地区数据的比较分析。
Front Public Health. 2022 Sep 27;10:990838. doi: 10.3389/fpubh.2022.990838. eCollection 2022.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Automatically determining cause of death from verbal autopsy narratives.从死因推断访谈记录中自动确定死因。
BMC Med Inform Decis Mak. 2019 Jul 9;19(1):127. doi: 10.1186/s12911-019-0841-9.
6
Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010.1990年和2010年20个年龄组中235种死因的全球和区域死亡率:全球疾病负担研究2010的系统分析
Lancet. 2012 Dec 15;380(9859):2095-128. doi: 10.1016/S0140-6736(12)61728-0.
7
Visualizing Nursing Narratives: An Evaluation of Latent Dirichlet Allocation Topic Modeling for Care Reports.可视化护理叙事:护理报告中潜在狄利克雷分配主题建模的评估。
Stud Health Technol Inform. 2024 Aug 22;316:1709-1713. doi: 10.3233/SHTI240752.
8
Global, regional, and national age-sex specific mortality for 264 causes of death, 1980-2016: a systematic analysis for the Global Burden of Disease Study 2016.全球、地区和国家按年龄、性别划分的 264 种死因的死亡率:2016 年全球疾病负担研究的系统分析。
Lancet. 2017 Sep 16;390(10100):1151-1210. doi: 10.1016/S0140-6736(17)32152-9.
9
Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.电子健康记录中自由文本叙述的症状的自然语言处理:系统评价。
J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173.
10
Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.使用自然语言处理和文本挖掘技术对创伤后应激障碍进行患者自我叙述的自动评估。
Assessment. 2017 Mar;24(2):157-172. doi: 10.1177/1073191115602551. Epub 2016 Jul 28.

引用本文的文献

1
Trends and Challenges in Plant Cryopreservation Research: A Meta-Analysis of Cryoprotective Agent Development and Research Focus.植物冷冻保存研究的趋势与挑战:冷冻保护剂开发及研究重点的荟萃分析
Plants (Basel). 2025 Feb 3;14(3):447. doi: 10.3390/plants14030447.

本文引用的文献

1
An overview of literature on COVID-19, MERS and SARS: Using text mining and latent Dirichlet allocation.关于新冠病毒、中东呼吸综合征和严重急性呼吸综合征的文献综述:运用文本挖掘和潜在狄利克雷分配法
J Inf Sci. 2022 Jun;48(3):304-320. doi: 10.1177/0165551520954674.
2
Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection.基于自然语言处理的 COVID-19 疑似患者识别。
Cad Saude Publica. 2023 Dec 4;39(11):e00243722. doi: 10.1590/0102-311XPT243722. eCollection 2023.
3
Natural language processing reveals research trends and topics in The Spine Journal over two decades: a topic modeling study.
自然语言处理揭示了《脊柱杂志》二十多年来的研究趋势和主题:一项主题建模研究。
Spine J. 2024 Mar;24(3):397-405. doi: 10.1016/j.spinee.2023.09.024. Epub 2023 Oct 4.
4
Use of topic modeling to assess research trends in the journal Gynecologic Oncology.使用主题建模评估《妇科肿瘤学杂志》中的研究趋势。
Gynecol Oncol. 2023 May;172:41-46. doi: 10.1016/j.ygyno.2023.03.001. Epub 2023 Mar 16.
5
Exploring patient experiences and concerns in the online Cochlear implant community: A cross-sectional study and validation of automated topic modelling.探索在线人工耳蜗植入社区中的患者经历与担忧:一项横断面研究及自动主题建模的验证
Clin Otolaryngol. 2023 May;48(3):442-450. doi: 10.1111/coa.14037. Epub 2023 Jan 31.
6
Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa.机器学习和计算机编码语言尸检(CCVA)算法在死因判定中的性能评估:来自南非农村地区数据的比较分析。
Front Public Health. 2022 Sep 27;10:990838. doi: 10.3389/fpubh.2022.990838. eCollection 2022.
7
Investigating Topic Modeling Techniques to Extract Meaningful Insights in Italian Long COVID Narration.研究主题建模技术以提取意大利长新冠叙述中有意义的见解。
BioTech (Basel). 2022 Sep 3;11(3):41. doi: 10.3390/biotech11030041.
8
Text mining in long-term care: Exploring the usefulness of artificial intelligence in a nursing home setting.长期护理中的文本挖掘:探索人工智能在养老院环境中的有用性。
PLoS One. 2022 Aug 25;17(8):e0268281. doi: 10.1371/journal.pone.0268281. eCollection 2022.
9
Analyzing Community Care Research Trends Using Text Mining.运用文本挖掘分析社区护理研究趋势
J Multidiscip Healthc. 2022 Jul 15;15:1493-1510. doi: 10.2147/JMDH.S366726. eCollection 2022.
10
Consumer perceptions of telehealth for mental health or substance abuse: a Twitter-based topic modeling analysis.消费者对心理健康或药物滥用远程医疗的认知:基于推特的主题建模分析
JAMIA Open. 2022 Apr 27;5(2):ooac028. doi: 10.1093/jamiaopen/ooac028. eCollection 2022 Jul.