Public Health Ontario (PHO), Toronto, ON, Canada.
Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
Sci Rep. 2023 May 26;13(1):8591. doi: 10.1038/s41598-023-35482-0.
The ability to extract critical information about an infectious disease in a timely manner is critical for population health research. The lack of procedures for mining large amounts of health data is a major impediment. The goal of this research is to use natural language processing (NLP) to extract key information (clinical factors, social determinants of health) from free text. The proposed framework describes database construction, NLP modules for locating clinical and non-clinical (social determinants) information, and a detailed evaluation protocol for evaluating results and demonstrating the effectiveness of the proposed framework. The use of COVID-19 case reports is demonstrated for data construction and pandemic surveillance. The proposed approach outperforms benchmark methods in F1-score by about 1-3%. A thorough examination reveals the disease's presence as well as the frequency of symptoms in patients. The findings suggest that prior knowledge gained through transfer learning can be useful when researching infectious diseases with similar presentations in order to accurately predict patient outcomes.
及时提取传染病关键信息对于人口健康研究至关重要。缺乏挖掘大量健康数据的程序是主要障碍。本研究旨在使用自然语言处理(NLP)从自由文本中提取关键信息(临床因素、健康的社会决定因素)。所提出的框架描述了数据库构建、用于定位临床和非临床(社会决定因素)信息的 NLP 模块,以及用于评估结果和展示所提出框架有效性的详细评估协议。使用 COVID-19 病例报告进行了数据构建和大流行监测。所提出的方法在 F1 分数上比基准方法高出约 1-3%。深入检查揭示了疾病的存在以及患者症状的频率。研究结果表明,在研究具有相似表现的传染病时,通过迁移学习获得的先验知识对于准确预测患者结局可能是有用的。