临床文本的自动分割

Automatic segmentation of clinical texts.

作者信息

Apostolova Emilia, Channin David S, Demner-Fushman Dina, Furst Jacob, Lytinen Steven, Raicu Daniela

机构信息

College of Computing and Digital Media, DePaul University, Chicago, IL 60604, USA.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:5905-8. doi: 10.1109/IEMBS.2009.5334831.

DOI:10.1109/IEMBS.2009.5334831

PMID:19965054

Abstract

Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text. Knowledge of the structure of clinical narratives is necessary for enhancing the productivity of healthcare departments and facilitating research. This study attempts to automatically segment medical reports into semantic sections. Our goal is to develop a robust and scalable medical report segmentation system requiring minimum user input for efficient retrieval and extraction of information from free-text clinical narratives. Hand-crafted rules were used to automatically identify a high-confidence training set. This automatically created training dataset was later used to develop metrics and an algorithm that determines the semantic structure of the medical reports. A word-vector cosine similarity metric combined with several heuristics was used to classify each report sentence into one of several pre-defined semantic sections. This baseline algorithm achieved 79% accuracy. A Support Vector Machine (SVM) classifier trained on additional formatting and contextual features was able to achieve 90% accuracy. Plans for future work include developing a configurable system that could accommodate various medical report formatting and content standards.

摘要

临床叙述，如放射学和病理学报告，通常以电子形式提供。然而，它们也通常作为自由文本输入和存储。了解临床叙述的结构对于提高医疗部门的工作效率和促进研究是必要的。本研究试图将医学报告自动分割成语义部分。我们的目标是开发一个强大且可扩展的医学报告分割系统，该系统需要最少的用户输入，以便从自由文本临床叙述中高效检索和提取信息。使用手工制作的规则自动识别一个高置信度训练集。这个自动创建的训练数据集随后被用于开发度量标准和一种确定医学报告语义结构的算法。一种结合了多种启发式方法的词向量余弦相似度度量标准被用于将每个报告句子分类到几个预定义的语义部分之一。这种基线算法的准确率达到了79%。在额外的格式和上下文特征上训练的支持向量机（SVM）分类器能够达到90%的准确率。未来的工作计划包括开发一个可配置的系统，该系统能够适应各种医学报告格式和内容标准。