Okagbue Hilary I, Ijezie Ogochukwu A, Ugwoke Paulinus O, Adeyemi-Kayode Temitope M, Jonathan Oluranti
Department of Mathematics, Covenant University, Ota, Nigeria.
Faculty of Science and Technology, Bournemouth University, Poole, BH12 5BB, UK.
Heliyon. 2023 Aug 23;9(9):e19422. doi: 10.1016/j.heliyon.2023.e19422. eCollection 2023 Sep.
Psychotic disorder diseases (PDD) or mental illnesses are group of illnesses that affect the minds and impair the cognitive ability, retard emotional ability and obstruct the process of communication and relationship with others and are characterized by delusions, hallucinations and disoriented or disordered pattern of thinking. Prognosis of PDD is not sufficient because of the nature of the diseases and as such adequate form of diagnosis is required to detect, manage and treat the illness. This paper applied the single-label classification (SLC) machine learning approach in mining of electronic health records of people with PDD in Nigeria using eleven independent (demographic) variables and five PDD as target variables. The five PDDs are Insomnia, Schizophrenia, Minimal Brain dysfunction (MBD), which is also known as Attention-Deficit/Hyperactivity Disorder (ADHD), Vascular Dementia (VD) and Bipolar Disorder (BD). The aim of using SLC is that it would be easier to detect some PDDs that are related to each other without the loss of information, which is a plus over multi-label classification (MLC). ReliefF algorithm was used at each experiment to precipitate the order of importance of the independent variables and redundant variables were excluded from the analysis. The order of the variables in feature selection was matched with feature importance after the classifications and quantified using the Spearman rank correlation coefficient. The data was divided into: 70% for training and 30% for testing. Four new performance metrics adapted from the root mean square (RMSE) were proposed and used to measure the differences between the performance results of the 10 Machine learning models in terms of the training and testing and secondly, feature and without feature selection. The new metrics are close to zero which is an indication that the use of feature selection and cross validation may not greatly affects the accuracy of the SLC. When the PDDs are included as predictors for classifying others, there was a tremendous improvement as revealed by the four new metrics for classification accuracy (CA), precision and recall. Analysis of variance showed the four different metrics differs significantly for classification accuracy (CA) and precision. However, there were no significant difference between the CA and precision when the duo are compared together across the four evaluation metrics at p value less than 0.05.
精神障碍疾病(PDD)或精神疾病是一组影响心智、损害认知能力、阻碍情感能力发展并妨碍与他人沟通及关系建立的疾病,其特征为妄想、幻觉以及思维紊乱或无序。由于这些疾病的性质,PDD的预后并不理想,因此需要适当的诊断形式来检测、管理和治疗该疾病。本文将单标签分类(SLC)机器学习方法应用于尼日利亚PDD患者电子健康记录的挖掘,使用了11个独立(人口统计学)变量和5种PDD作为目标变量。这5种PDD分别是失眠症、精神分裂症、轻度脑功能障碍(MBD)(也称为注意力缺陷多动障碍(ADHD))、血管性痴呆(VD)和双相情感障碍(BD)。使用SLC的目的是,在不损失信息的情况下,更容易检测出一些相互关联的PDD,这比多标签分类(MLC)更具优势。在每个实验中使用ReliefF算法来确定自变量的重要性顺序,并将冗余变量排除在分析之外。特征选择中变量的顺序与分类后的特征重要性相匹配,并使用斯皮尔曼等级相关系数进行量化。数据分为:70%用于训练,30%用于测试。提出了四个从均方根(RMSE)改编而来的新性能指标,用于衡量10种机器学习模型在训练和测试方面以及有无特征选择情况下的性能结果差异。新指标接近零,这表明使用特征选择和交叉验证可能不会对SLC的准确性产生太大影响。当将PDD作为预测因子用于对其他疾病进行分类时,四个新的分类准确率(CA)、精确率和召回率指标显示出了巨大的改进。方差分析表明,这四个不同指标在分类准确率(CA)和精确率方面存在显著差异。然而,当在p值小于0.05的四个评估指标中对二者进行比较时,CA和精确率之间没有显著差异。