Centre for Global Sustainability Studies, Universiti Sains Malaysia, Minden, Malaysia.
School of Languages, Literacies, and Translation, Universiti Sains Malaysia, Minden, Malaysia.
BMC Med Inform Decis Mak. 2022 Nov 24;22(1):306. doi: 10.1186/s12911-022-02050-x.
In healthcare area, big data, if integrated with machine learning, enables health practitioners to predict the result of a disorder or disease more accurately. In Autistic Spectrum Disorder (ASD), it is important to screen the patients to enable them to undergo proper treatments as early as possible. However, difficulties may arise in predicting ASD occurrences accurately, mainly caused by human errors. Data mining, if embedded into health screening practice, can help to overcome the difficulties. This study attempts to evaluate the performance of six best classifiers, taken from existing works, at analysing ASD screening training dataset.
We tested Naive Bayes, Logistic Regression, KNN, J48, Random Forest, SVM, and Deep Neural Network algorithms to ASD screening dataset and compared the classifiers' based on significant parameters; sensitivity, specificity, accuracy, receiver operating characteristic, area under the curve, and runtime, in predicting ASD occurrences. We also found that most of previous studies focused on classifying health-related dataset while ignoring the missing values which may contribute to significant impacts to the classification result which in turn may impact the life of the patients. Thus, we addressed the missing values by implementing imputation method where they are replaced with the mean of the available records found in the dataset.
We found that J48 produced promising results as compared to other classifiers when tested in both circumstances, with and without missing values. Our findings also suggested that SVM does not necessarily perform well for small and simple datasets. The outcome is hoped to assist health practitioners in making accurate diagnosis of ASD occurrences in patients.
在医疗保健领域,大数据如果与机器学习相结合,可以帮助医疗从业者更准确地预测疾病或疾病的结果。在自闭症谱系障碍(ASD)中,对患者进行筛查以使其能够尽早接受适当的治疗非常重要。然而,准确预测 ASD 的发生可能会遇到困难,主要是由于人为错误。数据挖掘如果嵌入到健康筛查实践中,可以帮助克服这些困难。本研究尝试评估从现有工作中选取的六种最佳分类器在分析 ASD 筛查训练数据集方面的性能。
我们测试了朴素贝叶斯、逻辑回归、KNN、J48、随机森林、SVM 和深度神经网络算法,对 ASD 筛查数据集进行了测试,并根据灵敏度、特异性、准确性、接收者操作特征、曲线下面积和运行时间等重要参数对分类器进行了比较,以预测 ASD 的发生。我们还发现,大多数先前的研究都集中在对健康相关数据集进行分类,而忽略了缺失值,这些缺失值可能会对分类结果产生重大影响,进而影响患者的生命。因此,我们通过实施插补方法来解决缺失值问题,即将缺失值替换为数据集内可用记录的平均值。
我们发现,在有和没有缺失值的情况下,J48 的测试结果都优于其他分类器。我们的研究结果还表明,SVM 不一定适用于小型和简单的数据集。希望这一结果能帮助医疗从业者对患者的 ASD 发生做出准确诊断。