Suppr超能文献

基于改进随机森林和 SVM 方法的精准丙型肝炎分类的混合模型。

Hybrid model for precise hepatitis-C classification using improved random forest and SVM method.

机构信息

Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, 140413, India.

College of Science and Engineering, Qatar Foundation, Hamad Bin Khalifa University, Doha, Qatar.

出版信息

Sci Rep. 2023 Aug 1;13(1):12473. doi: 10.1038/s41598-023-36605-3.

Abstract

Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model in the current research, which encounters several issues, i.e., poor accuracy, data imbalance, and overfitting. This research proposed a Hybrid Predictive Model (HPM) based on an improved random forest and support vector machine to overcome existing research limitations. The proposed model improves a random forest method by adding a bootstrapping approach. The existing RF method is enhanced by adding a bootstrapping process, which helps eliminate the tree's minor features iteratively to build a strong forest. It improves the performance of the HPM model. The proposed HPM model utilizes a 'Ranker method' to rank the dataset features and applies an IRF with SVM, selecting higher-ranked feature elements to build the prediction model. This research uses the online HCV dataset from UCI to measure the proposed model's performance. The dataset is highly imbalanced; to deal with this issue, we utilized the synthetic minority over-sampling technique (SMOTE). This research performs two experiments. The first experiment is based on data splitting methods, K-fold cross-validation, and training: testing-based splitting. The proposed method achieved an accuracy of 95.89% for k = 5 and 96.29% for k = 10; for the training and testing-based split, the proposed method achieved 91.24% for 80:20 and 92.39% for 70:30, which is the best compared to the existing SVM, MARS, RF, DT, and BGLM methods. In experiment 2, the analysis is performed using feature selection (with SMOTE and without SMOTE). The proposed method achieves an accuracy of 41.541% without SMOTE and 96.82% with SMOTE-based feature selection, which is better than existing ML methods. The experimental results prove the importance of feature selection to achieve higher accuracy in HCV research.

摘要

丙型肝炎病毒(HCV)是一种引起肝脏炎症的病毒感染。全球每年约有 340 万例 HCV 病例报告。早期诊断 HCV 有助于挽救生命。在 HCV 综述中,作者在当前研究中使用了基于单个 ML 的预测模型,该模型存在几个问题,例如准确性低、数据不平衡和过拟合。本研究提出了一种基于改进随机森林和支持向量机的混合预测模型(HPM),以克服现有研究的局限性。所提出的模型通过添加自举方法来改进随机森林方法。通过添加自举过程来增强现有的 RF 方法,该过程有助于迭代地消除树的次要特征,以构建强大的森林。它提高了 HPM 模型的性能。所提出的 HPM 模型利用“排序方法”对数据集特征进行排序,并应用带有 SVM 的 IRF,选择排名较高的特征元素来构建预测模型。本研究使用 UCI 上的在线 HCV 数据集来衡量所提出模型的性能。该数据集高度不平衡;为了解决这个问题,我们利用了合成少数过采样技术(SMOTE)。本研究进行了两项实验。第一项实验基于数据分割方法、K 折交叉验证、训练:测试分割。所提出的方法在 k=5 时的准确率为 95.89%,在 k=10 时的准确率为 96.29%;对于基于训练和测试的分割,所提出的方法在 80:20 时的准确率为 91.24%,在 70:30 时的准确率为 92.39%,与现有的 SVM、MARS、RF、DT 和 BGLM 方法相比,这是最好的。在实验 2 中,进行了特征选择(带 SMOTE 和不带 SMOTE)的分析。所提出的方法在不带 SMOTE 时的准确率为 41.541%,在带 SMOTE 的特征选择时的准确率为 96.82%,优于现有的 ML 方法。实验结果证明了特征选择对于在 HCV 研究中实现更高准确性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1487/10394001/6645331668b3/41598_2023_36605_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验