Bill Robert W, Liu Ying, McInnes Bridget T, Melton Genevieve B, Pedersen Ted, Pakhomov Serguei
Institute for Health Informatics, University of Minnesota, Twin Cities, MN, USA.
AMIA Annu Symp Proc. 2012;2012:43-50. Epub 2012 Nov 3.
A potential use of automated concept similarity and relatedness measures is to improve automatic detection of clinical text that relates to a condition indicative of an adverse drug reaction. This is also one of the purposes of the Medical Dictionary for Regulatory Activities (MedDRA) Standardized Queries (SMQ). An expert panel evaluates SMQs for their ability to detect a condition of interest and thus qualifies them as a reference standard for evaluating automated approaches. We compare similarity and relatedness measurement methods on rates of correctly identifying intra-category and inter-category concept pairs from SMQ data to create ROC curves of each method's sensitivity and specificity. Results indicate an information content measure, specifically the Resnik method, achieved the highest results as measured by area under the curve, but using two different measures as predictors, Resnik and Lin, obtained the highest score. Overall, using SMQ data resulted in a productive method of evaluating automated semantic relatedness and similarity scores.
自动概念相似度和相关性度量的一个潜在用途是改进对与药物不良反应相关病症的临床文本的自动检测。这也是《监管活动医学词典》(MedDRA)标准化查询(SMQ)的目的之一。一个专家小组评估SMQ检测感兴趣病症的能力,从而将其作为评估自动化方法的参考标准。我们根据从SMQ数据中正确识别类别内和类别间概念对的比率来比较相似度和相关性测量方法,以创建每种方法的敏感性和特异性的ROC曲线。结果表明,一种信息内容度量方法,特别是雷斯尼克方法,以曲线下面积衡量取得了最高结果,但使用两种不同度量作为预测指标,即雷斯尼克方法和林方法,获得了最高分。总体而言,使用SMQ数据产生了一种评估自动化语义相关性和相似度分数的有效方法。