Happe André, Cuggia Marc, Turlin Bruno, Le Beux Pierre
Intermede, La Basse Revachais 35580 Guignen - France.
Stud Health Technol Inform. 2008;136:815-20.
An algorithm for automatic coding of pathology reports using a multi-axial codification (ADICAP) is described and evaluated. It extracts << significant words >> or expressions from a corpus and records the statistical relationships between them and the modalities of the different axes. Different weighting functions are evaluated. With the best settings, in more than two cases out of three the correct modality was found among the top 5 list of candidates, except for the << organ >> axis. Several ways of improvement are discussed especially regarding the poor results on the << organ >> axis. Perspectives of a two stages assembling algorithm completing this first step are proposed.
本文描述并评估了一种使用多轴编码(ADICAP)对病理报告进行自动编码的算法。该算法从语料库中提取<<重要词汇>>或表达式,并记录它们与不同轴的模态之间的统计关系。评估了不同的加权函数。在最佳设置下,除了<<器官>>轴外,在超过三分之二的案例中,前5个候选列表中找到了正确的模态。讨论了几种改进方法,特别是关于<<器官>>轴上的不良结果。提出了一种完成第一步的两阶段组装算法的前景。