Elkin P L, Ruggieri A, Bergstrom L, Bauer B A, Lee M, Ogren P V, Chute C G
Proc AMIA Symp. 2000:220-4.
Medical information is increasingly being presented in a web-enabled format. Medical journals, guidelines, and textbooks are all accessible in a web-based format. It would be desirable to link these reference sources to the electronic medical record to provide education, to facilitate guideline implementation and usage and for decision support. In order for these rich information sources to be accessed via the medical record they will need to be indexed by a single comparable underlying reference terminology.
We took a random sample of 100 web pages out of the 6,000 web pages on the Mayo Clinic's Health Oasis web site. The web pages were divided into four datasets each containing 25 pages. These were humanly reviewed by four clinicians to identify all of the health concepts present (R1DA, R2DB, R3DC, R4DD). The web pages were simultaneously indexed using the SNOMED-RT beta release. The indexing engine has been previously described and validated. A new clinician reviewed the indexed web pages to determine the accuracy of the automated mappings as compared with the human identified concepts (R4DA, R3DB, R2DC, R1DD).
This review found 13,220 health concepts. Of these 10,383 concepts were identified by the initial human review (78.5% +/- 3.6%). The automated process identified 10,083 concepts correctly (76.3% +/- 4.0%) from within this corpus. The computer identified 2,420 concepts, which were not identified by the clinician's review but were upon further consideration important to include as health concepts. There was on average a 17.1% +/- 3.5% variability in the human reviewers ability to identify the important health concepts within web page content. Concept Based Indexing provided a positive predictive value (PPV) of finding a health concept of 79.3% as compared with keyword indexing which only has a PPV of 33.7% (p < 0.001).
SNOMED-RT is a reasonable ontology for web page indexing. Concept based indexing provides a significantly greater accuracy in identifying health concepts when compared with keyword indexing.
医学信息越来越多地以网络形式呈现。医学期刊、指南和教科书都可以通过网络获取。将这些参考资料与电子病历相链接以提供教育、促进指南的实施与应用并用于决策支持是很有必要的。为了能通过病历访问这些丰富的信息源,它们需要由单一可比的基础参考术语进行索引。
我们从梅奥诊所健康绿洲网站的6000个网页中随机抽取了100个网页样本。这些网页被分成四个数据集,每个数据集包含25个网页。由四位临床医生进行人工审阅,以识别所有呈现的健康概念(R1DA、R2DB、R3DC、R4DD)。同时使用SNOMED-RT测试版对这些网页进行索引。索引引擎先前已有描述并经过验证。另一位新的临床医生审阅已索引的网页,以确定与人工识别的概念相比自动映射的准确性(R4DA、R3DB、R2DC、R1DD)。
此次审阅共发现13220个健康概念。其中,10383个概念是由最初的人工审阅识别出来的(78.5%±3.6%)。自动化流程从该语料库中正确识别出10083个概念(76.3%±4.0%)。计算机识别出2420个概念,这些概念未被临床医生的审阅识别出来,但经进一步考虑后作为健康概念纳入很重要。在识别网页内容中重要健康概念的能力方面,人工审阅者之间平均存在17.1%±3.5%的差异。与关键词索引相比,基于概念的索引发现健康概念的阳性预测值(PPV)为79.3%,而关键词索引的PPV仅为33.7%(p<0.001)。
SNOMED-RT是用于网页索引的合理本体。与关键词索引相比,基于概念的索引在识别健康概念方面具有显著更高的准确性。