García-Ordás María Teresa, Benavides Carmen, Benítez-Andrades José Alberto, Alaiz-Moretón Héctor, García-Rodríguez Isaías
SECOMUCI Research Groups, Escuela de Ingenierías Industrial e Informática, Universidad de León, Campus de Vegazana s/n, León C.P. 24071, Spain.
SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, Campus of Vegazana s/n, León, León, 24071, Spain.
Comput Methods Programs Biomed. 2021 Apr;202:105968. doi: 10.1016/j.cmpb.2021.105968. Epub 2021 Feb 15.
Diabetes is a chronic pathology which is affecting more and more people over the years. It gives rise to a large number of deaths each year. Furthermore, many people living with the disease do not realize the seriousness of their health status early enough. Late diagnosis brings about numerous health problems and a large number of deaths each year so the development of methods for the early diagnosis of this pathology is essential.
In this paper, a pipeline based on deep learning techniques is proposed to predict diabetic people. It includes data augmentation using a variational autoencoder (VAE), feature augmentation using an sparse autoencoder (SAE) and a convolutional neural network for classification. Pima Indians Diabetes Database, which takes into account information on the patients such as the number of pregnancies, glucose or insulin level, blood pressure or age, has been evaluated.
A 92.31% of accuracy was obtained when CNN classifier is trained jointly the SAE for featuring augmentation over a well balanced dataset. This means an increment of 3.17% of accuracy with respect the state-of-the-art.
Using a full deep learning pipeline for data preprocessing and classification has demonstrate to be very promising in the diabetes detection field outperforming the state-of-the-art proposals.
糖尿病是一种慢性疾病,多年来影响着越来越多的人。每年都会导致大量死亡。此外,许多糖尿病患者没有足够早地意识到自己健康状况的严重性。延迟诊断每年会带来众多健康问题和大量死亡,因此开发这种疾病的早期诊断方法至关重要。
本文提出了一种基于深度学习技术的流程来预测糖尿病患者。它包括使用变分自编码器(VAE)进行数据增强、使用稀疏自编码器(SAE)进行特征增强以及使用卷积神经网络进行分类。对皮马印第安人糖尿病数据库进行了评估,该数据库考虑了患者的诸如怀孕次数、血糖或胰岛素水平、血压或年龄等信息。
当在一个平衡良好的数据集上联合训练用于特征增强的SAE的CNN分类器时,获得了92.31%的准确率。这意味着相对于现有技术,准确率提高了3.17%。
在糖尿病检测领域,使用完整的深度学习流程进行数据预处理和分类已被证明非常有前景,优于现有技术方案。