Soto-Murillo Manuel A, Galván-Tejada Jorge I, Galván-Tejada Carlos E, Celaya-Padilla Jose M, Luna-García Huizilopoztli, Magallanes-Quintanar Rafael, Gutiérrez-García Tania A, Gamboa-Rosales Hamurabi
Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, Zacatecas 98000, Mexico.
Departamento de Ciencias Computacionales, Centro Universitario de Ciencias Exactas e Ingenierías, Universidad de Guadalajara, Blvd. Marcelino García Barragán 1421, Guadalajara, Jalisco 44430, Mexico.
Healthcare (Basel). 2021 Mar 12;9(3):317. doi: 10.3390/healthcare9030317.
The main cause of death in Mexico and the world is heart disease, and it will continue to lead the death rate in the next decade according to data from the World Health Organization (WHO) and the National Institute of Statistics and Geography (INEGI). Therefore, the objective of this work is to implement, compare and evaluate machine learning algorithms that are capable of classifying normal and abnormal heart sounds. Three different sounds were analyzed in this study; normal heart sounds, heart murmur sounds and extra systolic sounds, which were labeled as healthy sounds (normal sounds) and unhealthy sounds (murmur and extra systolic sounds). From these sounds, fifty-two features were calculated to create a numerical dataset; thirty-six statistical features, eight Linear Predictive Coding (LPC) coefficients and eight Cepstral Frequency-Mel Coefficients (MFCC). From this dataset two more were created; one normalized and one standardized. These datasets were analyzed with six classifiers: k-Nearest Neighbors, Naive Bayes, Decision Trees, Logistic Regression, Support Vector Machine and Artificial Neural Networks, all of them were evaluated with six metrics: accuracy, specificity, sensitivity, ROC curve, precision and F1-score, respectively. The performances of all the models were statistically significant, but the models that performed best for this problem were logistic regression for the standardized data set, with a specificity of 0.7500 and a ROC curve of 0.8405, logistic regression for the normalized data set, with a specificity of 0.7083 and a ROC curve of 0.8407, and Support Vector Machine with a lineal kernel for the non-normalized data; with a specificity of 0.6842 and a ROC curve of 0.7703. Both of these metrics are of utmost importance in evaluating the performance of computer-assisted diagnostic systems.
在墨西哥乃至全世界,主要死因是心脏病。根据世界卫生组织(WHO)和国家统计与地理研究所(INEGI)的数据,在未来十年,心脏病仍将是导致死亡率居高的主要原因。因此,本研究的目的是实施、比较和评估能够对正常和异常心音进行分类的机器学习算法。本研究分析了三种不同的声音:正常心音、心脏杂音和额外收缩期心音,分别标记为健康声音(正常声音)和不健康声音(杂音和额外收缩期心音)。从这些声音中,计算出52个特征以创建一个数值数据集;36个统计特征、8个线性预测编码(LPC)系数和8个梅尔频率倒谱系数(MFCC)。从这个数据集中又创建了另外两个数据集:一个归一化数据集和一个标准化数据集。使用六个分类器对这些数据集进行分析:k近邻、朴素贝叶斯、决策树、逻辑回归、支持向量机和人工神经网络,所有这些分类器分别使用六个指标进行评估:准确率、特异性、灵敏度、ROC曲线、精确率和F1分数。所有模型的性能在统计学上都具有显著性,但针对此问题表现最佳的模型是:标准化数据集的逻辑回归,特异性为0.7500,ROC曲线为0.8405;归一化数据集的逻辑回归,特异性为0.7083,ROC曲线为0.8407;以及非归一化数据集的线性核支持向量机,特异性为0.6842,ROC曲线为0.7703。这两个指标对于评估计算机辅助诊断系统的性能至关重要。