Suppr超能文献

自动检测临床文本中的物质使用状况和相关信息。

Automated Detection of Substance-Use Status and Related Information from Clinical Text.

机构信息

Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia.

Department of Computer Science, Durham University, Upper Mountjoy Campus, Stockton Road, Durham DH1 3LE, UK.

出版信息

Sensors (Basel). 2022 Dec 8;22(24):9609. doi: 10.3390/s22249609.

Abstract

This study aims to develop and evaluate an automated system for extracting information related to patient substance use (smoking, alcohol, and drugs) from unstructured clinical text (medical discharge records). The authors propose a four-stage system for the extraction of the substance-use status and related attributes (type, frequency, amount, quit-time, and period). The first stage uses a keyword search technique to detect sentences related to substance use and to exclude unrelated records. In the second stage, an extension of the NegEx negation detection algorithm is developed and employed for detecting the negated records. The third stage involves identifying the temporal status of the substance use by applying windowing and chunking methodologies. Finally, in the fourth stage, regular expressions, syntactic patterns, and keyword search techniques are used in order to extract the substance-use attributes. The proposed system achieves an F1-score of up to 0.99 for identifying substance-use-related records, 0.98 for detecting the negation status, and 0.94 for identifying temporal status. Moreover, F1-scores of up to 0.98, 0.98, 1.00, 0.92, and 0.98 are achieved for the extraction of the amount, frequency, type, quit-time, and period attributes, respectively. Natural Language Processing (NLP) and rule-based techniques are employed efficiently for extracting substance-use status and attributes, with the proposed system being able to detect substance-use status and attributes over both sentence-level and document-level data. Results show that the proposed system outperforms the compared state-of-the-art substance-use identification system on an unseen dataset, demonstrating its generalisability.

摘要

本研究旨在开发和评估一种从非结构化临床文本(医疗出院记录)中提取与患者物质使用(吸烟、饮酒和药物)相关信息的自动化系统。作者提出了一个四阶段系统,用于提取物质使用状态和相关属性(类型、频率、数量、戒烟时间和时间段)。第一阶段使用关键字搜索技术来检测与物质使用相关的句子,并排除不相关的记录。在第二阶段,开发并应用了 NegEx 否定检测算法的扩展版本来检测否定记录。第三阶段通过应用窗口化和分块方法来确定物质使用的时间状态。最后,在第四阶段,使用正则表达式、语法模式和关键字搜索技术来提取物质使用属性。所提出的系统在识别与物质使用相关的记录方面达到了高达 0.99 的 F1 分数,在检测否定状态方面达到了 0.98,在识别时间状态方面达到了 0.94。此外,在提取数量、频率、类型、戒烟时间和时间段属性方面,分别达到了高达 0.98、0.98、1.00、0.92 和 0.98 的 F1 分数。自然语言处理 (NLP) 和基于规则的技术被有效地用于提取物质使用状态和属性,所提出的系统能够在句子级和文档级数据上检测物质使用状态和属性。结果表明,所提出的系统在未见过的数据集上优于比较的物质使用识别系统,证明了其泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acd1/9783118/0cc7b783f99f/sensors-22-09609-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验