Bern University of Applied Sciences, Medical Informatics, Biel, Switzerland.
ID Information und Dokumentation im Gesundheitswesen GmbH & Co. KGaA, Berlin, Germany.
Artif Intell Med. 2019 Jan;93:29-42. doi: 10.1016/j.artmed.2018.10.001. Epub 2018 Oct 29.
Classification systems such as ICD-10 for diagnoses or the Swiss Operation Classification System (CHOP) for procedure classification in the clinical treatment are essential for clinical management and information exchange. Traditionally, classification codes are assigned manually or by systems that rely upon concept-based or rule-based classification methods. Such methods can reach their limit easily due to the restricted coverage of handcrafted rules and of the vocabulary in underlying terminological systems. Conventional machine learning approaches normally depend on selected features within a human annotated training set. However, it is quite laborious to obtain a well labeled data set and its generation can easily be influenced by accumulative errors caused by human factors. To overcome this, we will present our processing pipeline for query matching realized through neural networks within the task of medical procedure classification. The pipeline is built upon convolutional neural networks (CNN) and autoencoder with logistic regression. On the task of relevance determination between query and category text, the autoencoder based method has achieved a micro F1 score of 70.29%, while the convolutional based method has reached a micro F1 score of 60.86% with high efficiency. These two algorithms are compared in experiments with different configurations and baselines (SVM, logistic regression) with respect to their suitability for the task of automatic encoding. Advantages and limitations are discussed.
分类系统,如 ICD-10 用于诊断或瑞士手术分类系统 (CHOP) 用于临床治疗中的程序分类,对于临床管理和信息交换至关重要。传统上,分类代码是手动分配的,或者是依赖于基于概念或基于规则的分类方法的系统分配的。由于手工规则和基础术语系统词汇的覆盖范围有限,这些方法很容易达到其限制。传统的机器学习方法通常依赖于人类注释训练集中的选定特征。然而,获得一个标记良好的数据集是非常费力的,并且其生成很容易受到人为因素造成的累积错误的影响。为了克服这个问题,我们将展示我们在医学程序分类任务中通过神经网络实现的查询匹配处理管道。该管道建立在卷积神经网络 (CNN) 和逻辑回归自动编码器之上。在查询和类别文本之间的相关性确定任务中,基于自动编码器的方法实现了 70.29%的微观 F1 分数,而基于卷积的方法则以高效率达到了 60.86%的微观 F1 分数。这两种算法在实验中进行了比较,涉及不同的配置和基线 (SVM、逻辑回归),以评估它们在自动编码任务中的适用性。讨论了优点和局限性。