基于改进随机森林和 SVM 方法的精准丙型肝炎分类的混合模型。

Hybrid model for precise hepatitis-C classification using improved random forest and SVM method.

机构信息

Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, 140413, India.

College of Science and Engineering, Qatar Foundation, Hamad Bin Khalifa University, Doha, Qatar.

出版信息

Sci Rep. 2023 Aug 1;13(1):12473. doi: 10.1038/s41598-023-36605-3.

DOI:10.1038/s41598-023-36605-3

PMID:37528148

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10394001/

Abstract

Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model in the current research, which encounters several issues, i.e., poor accuracy, data imbalance, and overfitting. This research proposed a Hybrid Predictive Model (HPM) based on an improved random forest and support vector machine to overcome existing research limitations. The proposed model improves a random forest method by adding a bootstrapping approach. The existing RF method is enhanced by adding a bootstrapping process, which helps eliminate the tree's minor features iteratively to build a strong forest. It improves the performance of the HPM model. The proposed HPM model utilizes a 'Ranker method' to rank the dataset features and applies an IRF with SVM, selecting higher-ranked feature elements to build the prediction model. This research uses the online HCV dataset from UCI to measure the proposed model's performance. The dataset is highly imbalanced; to deal with this issue, we utilized the synthetic minority over-sampling technique (SMOTE). This research performs two experiments. The first experiment is based on data splitting methods, K-fold cross-validation, and training: testing-based splitting. The proposed method achieved an accuracy of 95.89% for k = 5 and 96.29% for k = 10; for the training and testing-based split, the proposed method achieved 91.24% for 80:20 and 92.39% for 70:30, which is the best compared to the existing SVM, MARS, RF, DT, and BGLM methods. In experiment 2, the analysis is performed using feature selection (with SMOTE and without SMOTE). The proposed method achieves an accuracy of 41.541% without SMOTE and 96.82% with SMOTE-based feature selection, which is better than existing ML methods. The experimental results prove the importance of feature selection to achieve higher accuracy in HCV research.

摘要

丙型肝炎病毒（HCV）是一种引起肝脏炎症的病毒感染。全球每年约有 340 万例 HCV 病例报告。早期诊断 HCV 有助于挽救生命。在 HCV 综述中，作者在当前研究中使用了基于单个 ML 的预测模型，该模型存在几个问题，例如准确性低、数据不平衡和过拟合。本研究提出了一种基于改进随机森林和支持向量机的混合预测模型（HPM），以克服现有研究的局限性。所提出的模型通过添加自举方法来改进随机森林方法。通过添加自举过程来增强现有的 RF 方法，该过程有助于迭代地消除树的次要特征，以构建强大的森林。它提高了 HPM 模型的性能。所提出的 HPM 模型利用“排序方法”对数据集特征进行排序，并应用带有 SVM 的 IRF，选择排名较高的特征元素来构建预测模型。本研究使用 UCI 上的在线 HCV 数据集来衡量所提出模型的性能。该数据集高度不平衡；为了解决这个问题，我们利用了合成少数过采样技术（SMOTE）。本研究进行了两项实验。第一项实验基于数据分割方法、K 折交叉验证、训练：测试分割。所提出的方法在 k=5 时的准确率为 95.89%，在 k=10 时的准确率为 96.29%；对于基于训练和测试的分割，所提出的方法在 80:20 时的准确率为 91.24%，在 70:30 时的准确率为 92.39%，与现有的 SVM、MARS、RF、DT 和 BGLM 方法相比，这是最好的。在实验 2 中，进行了特征选择（带 SMOTE 和不带 SMOTE）的分析。所提出的方法在不带 SMOTE 时的准确率为 41.541%，在带 SMOTE 的特征选择时的准确率为 96.82%，优于现有的 ML 方法。实验结果证明了特征选择对于在 HCV 研究中实现更高准确性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1487/10394001/6645331668b3/41598_2023_36605_Fig1_HTML.jpg

相似文献

Hybrid model for precise hepatitis-C classification using improved random forest and SVM method.基于改进随机森林和 SVM 方法的精准丙型肝炎分类的混合模型。

Sci Rep. 2023 Aug 1;13(1):12473. doi: 10.1038/s41598-023-36605-3.

A hybrid Stacking-SMOTE model for optimizing the prediction of autistic genes.一种混合的堆叠-SMOTE 模型，用于优化自闭症基因预测。

BMC Bioinformatics. 2023 Oct 6;24(1):379. doi: 10.1186/s12859-023-05501-y.

Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines.通过数据平衡和特征选择，提高支持向量机应用前不平衡常规病理数据中肝炎病毒免疫测定结果预测的准确性。

BMC Med Inform Decis Mak. 2017 Aug 14;17(1):121. doi: 10.1186/s12911-017-0522-5.

A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method.基于 EGA-PSO 和改进随机森林的混合入侵检测模型

Sensors (Basel). 2022 Aug 10;22(16):5986. doi: 10.3390/s22165986.

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.机器学习混合模型预测慢性肾脏病。

Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.

CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。

BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.

isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection.isGPT：一种基于 SVM 和随机森林特征选择的亚高尔基体蛋白类型识别优化模型。

Artif Intell Med. 2018 Jan;84:90-100. doi: 10.1016/j.artmed.2017.11.003. Epub 2017 Nov 26.

Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection.使用增强型SMOTE和混沌进化特征选择的临床数据分类

Comput Biol Med. 2020 Nov;126:103991. doi: 10.1016/j.compbiomed.2020.103991. Epub 2020 Sep 18.

Diagnosis of Brain Metastases from Lung Cancer Using a Modified Electromagnetism like Mechanism Algorithm.基于改良电磁类机制算法诊断肺癌脑转移

J Med Syst. 2016 Jan;40(1):35. doi: 10.1007/s10916-015-0367-3. Epub 2015 Nov 14.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

引用本文的文献

Analysis of risk factors and early prediction model construction for gestational hypertension.妊娠期高血压危险因素分析及早期预测模型构建

Medicine (Baltimore). 2025 Aug 15;104(33):e43869. doi: 10.1097/MD.0000000000043869.

A risk prediction model for neovascular glaucoma secondary to proliferative diabetic retinopathy based on Boruta feature selection and random forest.基于Boruta特征选择和随机森林的增殖性糖尿病视网膜病变继发新生血管性青光眼风险预测模型

Front Cell Dev Biol. 2025 Jun 27;13:1604832. doi: 10.3389/fcell.2025.1604832. eCollection 2025.

Risk warning model for predicting sleep disorders in healthcare workers on long-term shifts.预测长期轮班医护人员睡眠障碍的风险预警模型。

Sleep Biol Rhythms. 2025 Apr 10;23(3):331-342. doi: 10.1007/s41105-025-00583-y. eCollection 2025 Jul.

Enlightened prognosis: Hepatitis prediction with an explainable machine learning approach.明智的预后：使用可解释的机器学习方法进行肝炎预测。

PLoS One. 2025 Apr 2;20(4):e0319078. doi: 10.1371/journal.pone.0319078. eCollection 2025.

A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering.一种使用多维预聚类进行丙型肝炎检测的交叉数据集元模型。

Sci Rep. 2025 Mar 1;15(1):7278. doi: 10.1038/s41598-025-91298-0.

On leveraging self-supervised learning for accurate HCV genotyping.利用自监督学习进行准确的 HCV 基因分型。

Sci Rep. 2024 Jul 5;14(1):15463. doi: 10.1038/s41598-024-64209-y.

本文引用的文献

Hybrid Model for Detection of Cervical Cancer Using Causal Analysis and Machine Learning Techniques.基于因果分析和机器学习技术的宫颈癌检测混合模型。

Comput Math Methods Med. 2022 May 4;2022:4688327. doi: 10.1155/2022/4688327. eCollection 2022.

Effect of a Default Order vs an Alert in the Electronic Health Record on Hepatitis C Virus Screening Among Hospitalized Patients: A Stepped-Wedge Randomized Clinical Trial.电子病历中的默认医嘱与警示对住院患者丙型肝炎病毒筛查的影响：一项阶梯式随机临床试验。

JAMA Netw Open. 2022 Mar 1;5(3):e222427. doi: 10.1001/jamanetworkopen.2022.2427.

Machine learning algorithms for predicting direct-acting antiviral treatment failure in chronic hepatitis C: An HCV-TARGET analysis.机器学习算法预测慢性丙型肝炎直接作用抗病毒治疗失败：HCV-TARGET 分析。

Hepatology. 2022 Aug;76(2):483-491. doi: 10.1002/hep.32347. Epub 2022 Feb 3.

Association Between Prescription Opioid Therapy for Noncancer Pain and Hepatitis C Virus Seroconversion.非癌症疼痛的处方类阿片治疗与丙型肝炎病毒血清转换之间的关联。

JAMA Netw Open. 2022 Jan 4;5(1):e2143050. doi: 10.1001/jamanetworkopen.2021.43050.

Evolving Applications of Artificial Intelligence and Machine Learning in Infectious Diseases Testing.人工智能和机器学习在传染病检测中的应用不断发展。

Clin Chem. 2021 Dec 30;68(1):125-133. doi: 10.1093/clinchem/hvab239.

Machine learning for mathematical models of HCV kinetics during antiviral therapy.机器学习在抗病毒治疗期间 HCV 动力学数学模型中的应用。

Math Biosci. 2022 Jan;343:108756. doi: 10.1016/j.mbs.2021.108756. Epub 2021 Dec 6.

Development and multicenter validation of FIB-6: A novel, machine learning, simple bedside score to rule out liver cirrhosis and compensated advanced chronic liver disease in patients with chronic hepatitis C.FIB-6的开发与多中心验证：一种用于排除慢性丙型肝炎患者肝硬化和代偿期晚期慢性肝病的新型机器学习简易床边评分系统

Hepatol Res. 2022 Feb;52(2):165-175. doi: 10.1111/hepr.13729. Epub 2021 Nov 24.

Machine-learning-based predictions of direct-acting antiviral therapy duration for patients with hepatitis C.基于机器学习的丙型肝炎患者直接抗病毒治疗持续时间预测。

Int J Med Inform. 2021 Oct;154:104562. doi: 10.1016/j.ijmedinf.2021.104562. Epub 2021 Aug 26.

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis.机器学习预测模型在慢性病诊断中的应用。

J Pers Med. 2020 Mar 31;10(2):21. doi: 10.3390/jpm10020021.

Mechanisms Underlying Hepatitis C Virus-Associated Hepatic Fibrosis.丙型肝炎病毒相关肝纤维化的发病机制。

Cells. 2019 Oct 14;8(10):1249. doi: 10.3390/cells8101249.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于改进随机森林和 SVM 方法的精准丙型肝炎分类的混合模型。

Hybrid model for precise hepatitis-C classification using improved random forest and SVM method.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献