Biomedical Information Processing Laboratory, Ecole de Technologie SuperieureUniversity of Quebec Montreal QC H3C 1K3 Canada.
Research Center at CHU Sainte-JustineUniversity of Montreal Montreal QC H3T 1J4 Canada.
IEEE J Transl Eng Health Med. 2023 Feb 2;11:469-478. doi: 10.1109/JTEHM.2023.3241635. eCollection 2023.
When dealing with clinical text classification on a small dataset, recent studies have confirmed that a well-tuned multilayer perceptron outperforms other generative classifiers, including deep learning ones. To increase the performance of the neural network classifier, feature selection for the learning representation can effectively be used. However, most feature selection methods only estimate the degree of linear dependency between variables and select the best features based on univariate statistical tests. Furthermore, the sparsity of the feature space involved in the learning representation is ignored.
Our aim is, therefore, to access an alternative approach to tackle the sparsity by compressing the clinical representation feature space, where limited French clinical notes can also be dealt with effectively.
This study proposed an autoencoder learning algorithm to take advantage of sparsity reduction in clinical note representation. The motivation was to determine how to compress sparse, high-dimensional data by reducing the dimension of the clinical note representation feature space. The classification performance of the classifiers was then evaluated in the trained and compressed feature space.
The proposed approach provided overall performance gains of up to 3% for each test set evaluation. Finally, the classifier achieved 92% accuracy, 91% recall, 91% precision, and 91% f1-score in detecting the patient's condition. Furthermore, the compression working mechanism and the autoencoder prediction process were demonstrated by applying the theoretic information bottleneck framework. Clinical and Translational Impact Statement- An autoencoder learning algorithm effectively tackles the problem of sparsity in the representation feature space from a small clinical narrative dataset. Significantly, it can learn the best representation of the training data because of its lossless compression capacity compared to other approaches. Consequently, its downstream classification ability can be significantly improved, which cannot be done using deep learning models.
在处理小型数据集上的临床文本分类时,最近的研究证实,经过良好调整的多层感知器优于其他生成式分类器,包括深度学习模型。为了提高神经网络分类器的性能,可以有效地对学习表示的特征进行选择。然而,大多数特征选择方法仅估计变量之间的线性相关性程度,并基于单变量统计检验选择最佳特征。此外,还忽略了学习表示中特征空间的稀疏性。
因此,我们的目的是采用一种替代方法来解决稀疏性问题,即压缩临床表示特征空间,同时还可以有效地处理有限的法语临床记录。
本研究提出了一种自动编码器学习算法,以利用临床记录表示中的稀疏性减少。其动机是确定如何通过减少临床记录表示特征空间的维度来压缩稀疏的高维数据。然后,在训练和压缩的特征空间中评估分类器的分类性能。
所提出的方法为每个测试集评估提供了高达 3%的整体性能提升。最后,该分类器在检测患者病情时实现了 92%的准确率、91%的召回率、91%的精度和 91%的 F1 分数。此外,通过应用理论信息瓶颈框架,演示了压缩工作机制和自动编码器预测过程。
临床和转化影响声明-自动编码器学习算法有效地解决了来自小型临床叙述数据集的表示特征空间中的稀疏性问题。重要的是,与其他方法相比,它具有无损压缩能力,可以学习到训练数据的最佳表示,因此可以显著提高其下游分类能力,而这是使用深度学习模型无法实现的。