Ahosan Abid Bin, Islam Forhadul, Mohi Uddin Khandaker Mohammad, Hasan Nahid, Uddin Md Ashraf
Department of Computer Science and Engineering, Dhaka International University, Dhaka, Bangladesh.
Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh.
Digit Health. 2025 Jun 16;11:20552076251350755. doi: 10.1177/20552076251350755. eCollection 2025 Jan-Dec.
Hepatitis B virus (HBV) is a significant global health threat, responsible for severe liver diseases such as liver failure, cirrhosis, and hepatocellular carcinoma. The burden is especially high in low-income regions, where early diagnosis and treatment are critical for mitigating its impact. This study investigates the effectiveness of various machine learning (ML) techniques in predicting patient outcomes in HBV infection.
The Chi-squared test was used for feature selection to find the most important factors, which were later applied to train and evaluate various ML models. To address the class imbalance in the dataset, the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the data. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were used to improve the models' interpretability.
Among individual models, Support Vector Machine (SVM) and Logistic Regression (LR) each achieved an accuracy of 92.5%. By implementing a Voting Classifier that combined SVM and LR, the overall accuracy was improved to 95%. The results showed that higher levels of some risk factors, especially in older patients, greatly raise the risk of death.
These insights provide healthcare professionals and policymakers with valuable information to develop predicting better patient outcomes in HBV infection and patient care strategies.
乙型肝炎病毒(HBV)是全球重大的健康威胁,可导致严重的肝脏疾病,如肝衰竭、肝硬化和肝细胞癌。在低收入地区,这种负担尤为沉重,早期诊断和治疗对于减轻其影响至关重要。本研究调查了各种机器学习(ML)技术在预测HBV感染患者预后方面的有效性。
采用卡方检验进行特征选择,以找出最重要的因素,随后将这些因素应用于训练和评估各种ML模型。为解决数据集中的类别不平衡问题,使用合成少数过采样技术(SMOTE)来平衡数据。使用夏普利加法解释(SHAP)和局部可解释模型无关解释(LIME)来提高模型的可解释性。
在单个模型中,支持向量机(SVM)和逻辑回归(LR)的准确率均达到92.5%。通过实施结合SVM和LR的投票分类器,总体准确率提高到了95%。结果表明,某些风险因素水平较高,尤其是在老年患者中,会大大增加死亡风险。
这些见解为医疗保健专业人员和政策制定者提供了有价值的信息,有助于制定更好的HBV感染患者预后预测和患者护理策略。