Arts D G T, Cornet R, de Jonge E, de Keizer N F
Academic Medical Center, Department of Medical Informatics. P.O. Box 22700, 1100 DE Amsterdam, The Netherlands.
Methods Inf Med. 2005;44(5):616-25.
The usability of terminological systems (TSs) strongly depends on the coverage and correctness of their content. The objective of this study was to create a literature overview of aspects related to the content of TSs and of methods for the evaluation of the content of TSs. The extent to which these methods overlap or complement each other is investigated.
We reviewed literature and composed definitions for aspects of the evaluation of the content of TSs. Of the methods described in literature three were selected: 1) Concept matching in which two samples of concepts representing a) documentation of reasons for admission in daily care practice and b) aggregation of patient groups for research, are looked up in the TS in order to assess its coverage; 2) Formal algorithmic evaluation in which reasoning on the formally represented content is used to detect inconsistencies; and 3) Expert review in which a random sample of concepts are checked for incorrect and incomplete terms and relations. These evaluation methods were applied in a case study on the locally developed TS DICE (Diagnoses for Intensive Care Evaluation).
None of the applied methods covered all the aspects of the content of a TS. The results of concept matching differed for the two use cases (63% vs. 52% perfect matches). Expert review revealed many more errors and incompleteness than formal algorithmic evaluation.
To evaluate the content of a TS, using a combination of evaluation methods is preferable. Different representative samples, reflecting the uses of TSs, lead to different results for concept matching. Expert review appears to be very valuable, but time consuming. Formal algorithmic evaluation has the potential to decrease the workload of human reviewers but detects only logical inconsistencies. Further research is required to exploit the potentials of formal algorithmic evaluation.
术语系统(TSs)的可用性在很大程度上取决于其内容的覆盖范围和正确性。本研究的目的是对与TSs内容相关的方面以及TSs内容评估方法进行文献综述。研究这些方法相互重叠或补充的程度。
我们回顾了文献并为TSs内容评估的各个方面编写了定义。从文献中描述的方法中选择了三种:1)概念匹配,即在TS中查找分别代表a)日常护理实践中入院原因记录和b)研究患者群体汇总的两个概念样本,以评估其覆盖范围;2)形式算法评估,即利用对形式化表示内容的推理来检测不一致性;3)专家评审,即对随机抽取的概念样本检查是否存在不正确和不完整的术语及关系。这些评估方法应用于对本地开发的TS DICE(重症监护评估诊断)的案例研究。
所应用的方法均未涵盖TS内容的所有方面。两个用例的概念匹配结果不同(完美匹配率分别为63%和52%)。专家评审发现的错误和不完整性比形式算法评估多得多。
为评估TS的内容,最好结合使用多种评估方法。反映TSs用途的不同代表性样本会导致概念匹配产生不同结果。专家评审似乎非常有价值,但耗时较长。形式算法评估有可能减少人工评审的工作量,但只能检测逻辑不一致性。需要进一步研究以挖掘形式算法评估的潜力。