Cömert Zafer, Şengür Abdulkadir, Budak Ümit, Kocamaz Adnan Fatih
Department of Software Engineering, Samsun University, Samsun, Turkey.
2Department of Electrical and Electronics Engineering, Technology Faculty, Firat University, Elazig, Turkey.
Health Inf Sci Syst. 2019 Aug 20;7(1):17. doi: 10.1007/s13755-019-0079-z. eCollection 2019 Dec.
Cardiotocography (CTG) consists of two biophysical signals that are fetal heart rate (FHR) and uterine contraction (UC). In this research area, the computerized systems are usually utilized to provide more objective and repeatable results.
Feature selection algorithms are of great importance regarding the computerized systems to not only reduce the dimension of feature set but also to reveal the most relevant features without losing too much information. In this paper, three filters and two wrappers feature selection methods and machine learning models, which are artificial neural network (ANN), -nearest neighbor (NN), decision tree (DT), and support vector machine (SVM), are evaluated on a high dimensional feature set obtained from an open-access CTU-UHB intrapartum CTG database. The signals are divided into two classes as normal and hypoxic considering umbilical artery pH value (pH < 7.20) measured after delivery. A comprehensive diagnostic feature set forming the features obtained from morphological, linear, nonlinear, time-frequency and image-based time-frequency domains is generated first. Then, combinations of the feature selection algorithms and machine learning models are evaluated to achieve the most effective features as well as high classification performance.
The experimental results show that it is possible to achieve better classification performance using lower dimensional feature set that comprises of more related features, instead of the high-dimensional feature set. The most informative feature subset was generated by considering the frequency of selection of the features by feature selection algorithms. As a result, the most efficient results were produced by selected only 12 relevant features instead of a full feature set consisting of 30 diagnostic indices and SVM model. Sensitivity and specificity were achieved as 77.40% and 93.86%, respectively.
Consequently, the evaluation of multiple feature selection algorithms resulted in achieving the best results.
胎心监护(CTG)由两个生物物理信号组成,即胎儿心率(FHR)和子宫收缩(UC)。在该研究领域,通常利用计算机系统来提供更客观和可重复的结果。
特征选择算法对于计算机系统非常重要,它不仅可以降低特征集的维度,还能在不丢失太多信息的情况下揭示最相关的特征。本文在从开放获取的CTU-UHB产时CTG数据库获得的高维特征集上,评估了三种过滤和两种包装器特征选择方法以及机器学习模型,即人工神经网络(ANN)、 -最近邻(NN)、决策树(DT)和支持向量机(SVM)。根据分娩后测量的脐动脉pH值(pH < 7.20),将信号分为正常和缺氧两类。首先生成一个综合诊断特征集,该特征集由从形态学、线性、非线性、时频和基于图像的时频域获得的特征组成。然后,评估特征选择算法和机器学习模型的组合,以获得最有效的特征以及高分类性能。
实验结果表明,使用由更多相关特征组成的低维特征集,而不是高维特征集,可以实现更好的分类性能。通过考虑特征选择算法对特征的选择频率,生成了最具信息性的特征子集。结果,仅选择12个相关特征而不是由30个诊断指标组成的完整特征集和SVM模型产生了最有效的结果。敏感性和特异性分别达到77.40%和93.86%。
因此,对多种特征选择算法的评估产生了最佳结果。