Huang Yang, Lowe Henry J, Hersh William R
Stanford Medical Informatics, The Office of Information Resources and Technology, Stanford University School of Medicine, California 94305, USA.
J Am Med Inform Assoc. 2003 Nov-Dec;10(6):580-7. doi: 10.1197/jamia.M1369. Epub 2003 Aug 4.
Despite the advantages of structured data entry, much of the patient record is still stored as unstructured or semistructured narrative text. The issue of representing clinical document content remains problematic. The authors' prior work using an automated UMLS document indexing system has been encouraging but has been affected by the generally low indexing precision of such systems. In an effort to improve precision, the authors have developed a context-sensitive document indexing model to calculate the optimal subset of UMLS source vocabularies used to index each document section. This pilot study was performed to evaluate the utility of this indexing approach on a set of clinical radiology reports.
A set of clinical radiology reports that had been indexed manually using UMLS concept descriptors was indexed automatically by the SAPHIRE indexing engine. Using the data generated by this process the authors developed a system that simulated indexing, at the document section level, of the same document set using many permutations of a subset of the UMLS constituent vocabularies.
The precision and recall scores generated by simulated indexing for each permutation of two or three UMLS constituent vocabularies were determined.
While there was considerable variation in precision and recall values across the different subtypes of radiology reports, the overall effect of this indexing strategy using the best combination of two or three UMLS constituent vocabularies was an improvement in precision without significant impact of recall.
In this pilot study a contextual indexing strategy improved overall precision in a set of clinical radiology reports.
尽管结构化数据录入具有诸多优势,但患者记录的大部分内容仍以非结构化或半结构化的叙述性文本形式存储。表示临床文档内容的问题仍然存在。作者先前使用自动统一医学语言系统(UMLS)文档索引系统的工作令人鼓舞,但受到此类系统通常较低的索引精度的影响。为了提高精度,作者开发了一种上下文敏感的文档索引模型,以计算用于索引每个文档部分的UMLS源词汇的最佳子集。本试点研究旨在评估这种索引方法在一组临床放射学报告中的效用。
一组已使用UMLS概念描述符进行手动索引的临床放射学报告由蓝宝石索引引擎自动索引。利用此过程生成的数据,作者开发了一个系统,该系统在文档部分级别模拟使用UMLS组成词汇的一个子集的许多排列对同一文档集进行索引。
确定由两个或三个UMLS组成词汇的每个排列的模拟索引生成的精度和召回率分数。
虽然不同亚型的放射学报告的精度和召回率值存在相当大的差异,但使用两个或三个UMLS组成词汇的最佳组合的这种索引策略的总体效果是精度提高,而召回率没有显著影响。
在本试点研究中,一种上下文索引策略提高了一组临床放射学报告的总体精度。