Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA.
Division of Environmental Health Sciences, College of Public Health, The Ohio State University, Columbus, Ohio, USA.
Stud Health Technol Inform. 2022 Jun 6;290:140-144. doi: 10.3233/SHTI220048.
As Named Entity Recognition (NER) has been essential in identifying critical elements of unstructured content, generic NER tools remain limited in recognizing entities specific to a domain, such as drug use and public health. For such high-impact areas, accurately capturing relevant entities at a more granular level is critical, as this information influences real-world processes. On the other hand, training NER models for a specific domain without handcrafted features requires an extensive amount of labeled data, which is expensive in human effort and time. In this study, we employ distant supervision utilizing a domain-specific ontology to reduce the need for human labor and train models incorporating domain-specific (e.g., drug use) external knowledge to recognize domain specific entities. We capture entities related the drug use and their trends in government epidemiology reports, with an improvement of 8% in F1-score.
命名实体识别(Named Entity Recognition,NER)在识别非结构化内容的关键元素方面至关重要,但通用的 NER 工具在识别特定领域的实体方面仍然存在局限性,例如药物使用和公共卫生。对于这些高影响力的领域,准确地以更细粒度的方式捕获相关实体至关重要,因为这些信息会影响现实世界的流程。另一方面,在没有手工制作特征的情况下针对特定领域训练 NER 模型需要大量的标记数据,这在人力和时间方面都非常昂贵。在这项研究中,我们利用领域特定的本体进行远程监督,以减少对人工的需求,并训练包含领域特定(例如,药物使用)外部知识的模型来识别特定领域的实体。我们从政府流行病学报告中捕获与药物使用相关的实体及其趋势,F1 得分为 8%。