Luo Zhihui, Johnson Stephen B, Weng Chunhua
Department of Biomedical Informatics, Columbia University, New York, NY 10032.
AMIA Annu Symp Proc. 2010 Nov 13;2010:487-91.
This paper presents a novel approach to learning semantic classes of clinical research eligibility criteria. It uses the UMLS Semantic Types to represent semantic features and the Hierarchical Clustering method to group similar eligibility criteria. By establishing a gold standard using two independent raters, we evaluated the coverage and accuracy of the induced semantic classes. On 2,718 random eligibility criteria sentences, the inter-rater classification agreement was 85.73%. In a 10-fold validation test, the average Precision, Recall and F-score of the classification results of a decision-tree classifier were 87.8%, 88.0%, and 87.7% respectively. Our induced classes well aligned with 16 out of 17 eligibility criteria classes defined by the BRIDGE model. We discuss the potential of this method and our future work.
本文提出了一种学习临床研究纳入标准语义类别的新方法。它使用统一医学语言系统(UMLS)语义类型来表示语义特征,并采用层次聚类方法对相似的纳入标准进行分组。通过使用两名独立评分者建立黄金标准,我们评估了诱导语义类别的覆盖范围和准确性。在2718个随机的纳入标准句子上,评分者间的分类一致性为85.73%。在10折交叉验证测试中,决策树分类器分类结果的平均精确率、召回率和F值分别为87.8%、88.0%和87.7%。我们诱导出的类别与BRIDGE模型定义的17个纳入标准类别中的16个高度吻合。我们讨论了该方法的潜力以及我们未来的工作。