Slater Karin, Bradlow William, Motti Dino Fa, Hoehndorf Robert, Ball Simon, Gkoutos Georgios V
College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
Comput Biol Med. 2021 Mar;130:104216. doi: 10.1016/j.compbiomed.2021.104216. Epub 2021 Jan 16.
Negation detection is an important task in biomedical text mining. Particularly in clinical settings, it is of critical importance to determine whether findings mentioned in text are present or absent. Rule-based negation detection algorithms are a common approach to the task, and more recent investigations have resulted in the development of rule-based systems utilising the rich grammatical information afforded by typed dependency graphs. However, interacting with these complex representations inevitably necessitates complex rules, which are time-consuming to develop and do not generalise well. We hypothesise that a heuristic approach to determining negation via dependency graphs could offer a powerful alternative. We describe and implement an algorithm for negation detection based on grammatical distance from a negatory construct in a typed dependency graph. To evaluate the algorithm, we develop two testing corpora comprised of sentences of clinical text extracted from the MIMIC-III database and documents related to hypertrophic cardiomyopathy patients routinely collected at University Hospitals Birmingham NHS trust. Gold-standard validation datasets were built by a combination of human annotation and examination of algorithm error. Finally, we compare the performance of our approach with four other rule-based algorithms on both gold-standard corpora. The presented algorithm exhibits the best performance by f-measure over the MIMIC-III dataset, and a similar performance to the syntactic negation detection systems over the HCM dataset. It is also the fastest of the dependency-based negation systems explored in this study. Our results show that while a single heuristic approach to dependency-based negation detection is ignorant to certain advanced cases, it nevertheless forms a powerful and stable method, requiring minimal training and adaptation between datasets. As such, it could present a drop-in replacement or augmentation for many-rule negation approaches in clinical text-mining pipelines, particularly for cases where adaptation and rule development is not required or possible.
否定检测是生物医学文本挖掘中的一项重要任务。特别是在临床环境中,确定文本中提到的发现是否存在至关重要。基于规则的否定检测算法是完成这项任务的常用方法,最近的研究导致了利用类型依存关系图提供的丰富语法信息开发基于规则的系统。然而,与这些复杂的表示进行交互不可避免地需要复杂的规则,这些规则开发耗时且泛化性不佳。我们假设通过依存关系图确定否定的启发式方法可能提供一种强大的替代方案。我们描述并实现了一种基于类型依存关系图中与否定结构的语法距离进行否定检测的算法。为了评估该算法,我们开发了两个测试语料库,它们由从MIMIC-III数据库中提取的临床文本句子以及伯明翰大学医院国民保健服务信托基金常规收集的肥厚型心肌病患者相关文档组成。通过人工标注和算法错误检查相结合的方式构建了金标准验证数据集。最后,我们在两个金标准语料库上比较了我们的方法与其他四种基于规则的算法的性能。所提出的算法在MIMIC-III数据集上通过F值表现出最佳性能,在肥厚型心肌病数据集上与句法否定检测系统表现相似。它也是本研究中探索的基于依存关系的否定系统中最快的。我们的结果表明,虽然基于依存关系的否定检测的单一启发式方法对某些高级情况不敏感,但它仍然形成了一种强大且稳定的方法,在不同数据集之间所需的训练和调整最少。因此,它可以作为临床文本挖掘管道中多规则否定方法的直接替代或补充,特别是在不需要或不可能进行调整和规则开发的情况下。