Decaro Cristoforo, Montanari Giovanni Battista, Bianconi Marco, Bellanca Gaetano
Department of Engineering University of Ferrara Ferrara Italy.
MISTER Smart Innovation Bologna 40129 Italy.
Healthc Technol Lett. 2021 Apr 6;8(2):37-44. doi: 10.1049/htl2.12006. eCollection 2021 Apr.
In spite of machine learning has been successfully used in a wide range of healthcare applications, there are several parameters that could influence the performance of a machine learning system. One of the big issues for a machine learning algorithm is related to imbalanced dataset. An imbalanced dataset occurs when the distribution of data is not uniform. This makes harder the implementation of accurate models. In this paper, intelligent models are implemented to predict the hematocrit level of blood starting from visible spectral data. The aim of this work is to show the effects of two balancing techniques (SMOTE and SMOTE+ENN) on the imbalanced dataset of blood spectra. Four different machine learning systems are fitted with imbalanced and balanced datasets and their performances are compared showing an improvement, in terms of accuracy, due to the use of balancing.
尽管机器学习已成功应用于广泛的医疗保健应用中,但仍有几个参数可能会影响机器学习系统的性能。机器学习算法面临的一个重大问题与不平衡数据集有关。当数据分布不均匀时,就会出现不平衡数据集。这使得准确模型的实现更加困难。在本文中,从可见光谱数据出发,实现了智能模型来预测血液中的血细胞比容水平。这项工作的目的是展示两种平衡技术(SMOTE和SMOTE+ENN)对血液光谱不平衡数据集的影响。将四种不同的机器学习系统与不平衡和平衡数据集进行拟合,并比较它们的性能,结果表明,由于使用了平衡技术,在准确性方面有了提高。