Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Room 524, SIT Building (J12), Darlington, NSW, 2008, Australia.
Health Market Quality Research Stream, Capital Markets CRC, Level 3, 55 Harrington Street, Sydney, NSW, Australia.
BMC Med Inform Decis Mak. 2019 Dec 21;19(1):281. doi: 10.1186/s12911-019-1004-8.
Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study ai7ms to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction.
In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction.
We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered.
This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.
监督机器学习算法是数据挖掘领域的主要方法。使用健康数据进行疾病预测最近显示出这些方法的一个潜在应用领域。本研究旨在识别不同类型的监督机器学习算法之间的关键趋势,以及它们在疾病风险预测中的性能和用途。
在这项研究中,我们进行了广泛的研究工作,以确定那些在单一疾病预测中应用了多种监督机器学习算法的研究。我们在两个数据库(即 Scopus 和 PubMed)中搜索了不同类型的搜索项。因此,我们总共选择了 48 篇文章,用于比较用于疾病预测的监督机器学习算法的变体。
我们发现支持向量机(SVM)算法的应用最为频繁(在 29 项研究中),其次是朴素贝叶斯算法(在 23 项研究中)。然而,随机森林(RF)算法的准确性相对较高。在应用的 17 项研究中,RF 在其中 9 项研究中表现出最高的准确性,即 53%。其次是 SVM,在其被考虑的研究中,有 41%的研究排名第一。
本研究提供了监督机器学习算法在疾病预测方面的相对性能的广泛概述。这些相对性能的重要信息可用于帮助研究人员在他们的研究中选择适当的监督机器学习算法。