通过数据平衡和特征选择，提高支持向量机应用前不平衡常规病理数据中肝炎病毒免疫测定结果预测的准确性。

Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines.

机构信息

Present address: National Centre for Epidemiology & Population Health, Australian National University, Canberra, ACT 2601, Australia.

Pattern Recognition & Pathology, Department of Genome Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.

出版信息

BMC Med Inform Decis Mak. 2017 Aug 14;17(1):121. doi: 10.1186/s12911-017-0522-5.

DOI:10.1186/s12911-017-0522-5

PMID:28806936

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5557531/

Abstract

BACKGROUND

Data mining techniques such as support vector machines (SVMs) have been successfully used to predict outcomes for complex problems, including for human health. Much health data is imbalanced, with many more controls than positive cases.

METHODS

The impact of three balancing methods and one feature selection method is explored, to assess the ability of SVMs to classify imbalanced diagnostic pathology data associated with the laboratory diagnosis of hepatitis B (HBV) and hepatitis C (HCV) infections. Random forests (RFs) for predictor variable selection, and data reshaping to overcome a large imbalance of negative to positive test results in relation to HBV and HCV immunoassay results, are examined. The methodology is illustrated using data from ACT Pathology (Canberra, Australia), consisting of laboratory test records from 18,625 individuals who underwent hepatitis virus testing over the decade from 1997 to 2007.

RESULTS

Overall, the prediction of HCV test results by immunoassay was more accurate than for HBV immunoassay results associated with identical routine pathology predictor variable data. HBV and HCV negative results were vastly in excess of positive results, so three approaches to handling the negative/positive data imbalance were compared. Generating datasets by the Synthetic Minority Oversampling Technique (SMOTE) resulted in significantly more accurate prediction than single downsizing or multiple downsizing (MDS) of the dataset. For downsized data sets, applying a RF for predictor variable selection had a small effect on the performance, which varied depending on the virus. For SMOTE, a RF had a negative effect on performance. An analysis of variance of the performance across settings supports these findings. Finally, age and assay results for alanine aminotransferase (ALT), sodium for HBV and urea for HCV were found to have a significant impact upon laboratory diagnosis of HBV or HCV infection using an optimised SVM model.

CONCLUSIONS

Laboratories looking to include machine learning via SVM as part of their decision support need to be aware that the balancing method, predictor variable selection and the virus type interact to affect the laboratory diagnosis of hepatitis virus infection with routine pathology laboratory variables in different ways depending on which combination is being studied. This awareness should lead to careful use of existing machine learning methods, thus improving the quality of laboratory diagnosis.

摘要

背景

数据挖掘技术，如支持向量机（SVM），已成功用于预测复杂问题的结果，包括人类健康问题。许多健康数据是不平衡的，对照例数远远多于阳性病例数。

方法

本研究探索了三种平衡方法和一种特征选择方法的影响，以评估 SVM 对与乙型肝炎（HBV）和丙型肝炎（HCV）感染的实验室诊断相关的不平衡诊断病理学数据进行分类的能力。研究考察了随机森林（RFs）在预测变量选择中的应用，以及数据重塑以克服与 HBV 和 HCV 免疫测定结果相关的大量负阳性测试结果的不平衡。该方法使用 1997 年至 2007 年十年间 ACT 病理学（澳大利亚堪培拉）的 18625 名接受肝炎病毒检测的个体的实验室检测记录数据进行说明。

结果

总体而言，与相同常规病理学预测变量数据相关的 HBV 免疫测定结果相比，免疫测定法对 HCV 检测结果的预测更准确。HBV 和 HCV 阴性结果大大超过阳性结果，因此比较了三种处理正负数据不平衡的方法。通过合成少数过采样技术（SMOTE）生成数据集比数据集的单一缩小或多次缩小（MDS）更能准确地预测。对于缩小的数据集，应用 RF 进行预测变量选择对性能的影响很小，这取决于病毒的不同而有所差异。对于 SMOTE，RF 对性能有负面影响。方差分析支持这些发现。最后，发现年龄和丙氨酸氨基转移酶（ALT）的检测结果、HBV 的钠和 HCV 的尿素对使用优化的 SVM 模型进行 HBV 或 HCV 感染的实验室诊断有显著影响。

结论

希望将机器学习（通过 SVM）作为其决策支持的一部分的实验室需要意识到，平衡方法、预测变量选择以及病毒类型相互作用，以不同的方式影响使用常规病理学实验室变量对乙型肝炎病毒感染的实验室诊断，具体取决于正在研究的组合。这种认识应该导致对现有机器学习方法的谨慎使用，从而提高实验室诊断的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fee/5557531/7d73e22d2bd3/12911_2017_522_Fig1_HTML.jpg

相似文献

Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines.通过数据平衡和特征选择，提高支持向量机应用前不平衡常规病理数据中肝炎病毒免疫测定结果预测的准确性。

BMC Med Inform Decis Mak. 2017 Aug 14;17(1):121. doi: 10.1186/s12911-017-0522-5.

Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data.感染状况结果、机器学习方法和病毒类型相互作用，影响不平衡数据中常规病理实验室检测对肝炎病毒免疫测定结果的优化预测。

BMC Bioinformatics. 2013 Jun 25;14:206. doi: 10.1186/1471-2105-14-206.

Hybrid model for precise hepatitis-C classification using improved random forest and SVM method.基于改进随机森林和 SVM 方法的精准丙型肝炎分类的混合模型。

Sci Rep. 2023 Aug 1;13(1):12473. doi: 10.1038/s41598-023-36605-3.

Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining.通过病理数据挖掘预测中国患者乙型肝炎病毒表面抗原的存在。

J Med Virol. 2013 Aug;85(8):1334-9. doi: 10.1002/jmv.23609.

[Guideline for interpretation and report of the antibody to hepatitis C virus. Grupo de Desarrollo de la Guía ].[丙型肝炎病毒抗体检测结果解读与报告指南。指南制定小组]

Rev Invest Clin. 2012 Nov-Dec;64(6 Pt 2):641-78.

The development of a machine learning algorithm for early detection of viral hepatitis B infection in Nigerian patients.用于早期检测尼日利亚患者乙型病毒性肝炎感染的机器学习算法的开发。

Sci Rep. 2023 Feb 24;13(1):3244. doi: 10.1038/s41598-023-30440-2.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

Clinical performance of the novel DiaSorin LIAISON(®) XL murex: HBsAg Quant, HCV-Ab, HIV-Ab/Ag assays.新型 DiaSorin LIAISON(®) XL murex：HBsAg Quant、HCV-Ab、HIV-Ab/Ag 检测试剂盒的临床性能。

J Clin Virol. 2014 Jan;59(1):44-9. doi: 10.1016/j.jcv.2013.10.009. Epub 2013 Nov 7.

[Investigation of hepatitis B and hepatitis C virus infections by serological and molecular methods in hemodialysis patients].[采用血清学和分子方法对血液透析患者进行乙型和丙型肝炎病毒感染调查]

Mikrobiyol Bul. 2014 Jan;48(1):143-50.

Stroke Prediction with Machine Learning Methods among Older Chinese.基于机器学习方法对中国老年人进行中风预测。

Int J Environ Res Public Health. 2020 Mar 12;17(6):1828. doi: 10.3390/ijerph17061828.

引用本文的文献

Clinical Validity of a Machine Learning Decision Support System for Early Detection of Hepatitis B Virus: A Binational External Validation Study.机器学习决策支持系统早期检测乙型肝炎病毒的临床有效性：一项中澳两国外部验证研究。

Viruses. 2023 Aug 14;15(8):1735. doi: 10.3390/v15081735.

Sci Rep. 2023 Feb 24;13(1):3244. doi: 10.1038/s41598-023-30440-2.

Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review.通过常规实验室检查利用人工智能搜索新信息：系统评价

JMIR Bioinform Biotechnol. 2022 Dec 23;3(1):e40473. doi: 10.2196/40473. eCollection 2022 Jan-Dec.

Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening.使用机器学习和常规实验室检测进行糖尿病筛查。

Biomed Res Int. 2022 Mar 29;2022:8114049. doi: 10.1155/2022/8114049. eCollection 2022.

Gamma-Glutamyl Transferase (GGT) Is the Leading External Quality Assurance Predictor of ISO15189 Compliance for Pathology Laboratories.γ-谷氨酰转移酶（GGT）是病理学实验室符合ISO15189标准的主要外部质量保证指标。

Diagnostics (Basel). 2021 Apr 13;11(4):692. doi: 10.3390/diagnostics11040692.

Has the Flood Entered the Basement? A Systematic Literature Review about Machine Learning in Laboratory Medicine.洪水已侵入地下室了吗？关于检验医学中机器学习的系统文献综述。

Diagnostics (Basel). 2021 Feb 22;11(2):372. doi: 10.3390/diagnostics11020372.

本文引用的文献

Improvement of predictive models of risk of disease progression in chronic hepatitis C by incorporating longitudinal data.通过纳入纵向数据改进慢性丙型肝炎疾病进展风险的预测模型。

Hepatology. 2015 Jun;61(6):1832-41. doi: 10.1002/hep.27750. Epub 2015 Mar 20.

Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD.基于集成的混合概率采样用于肺结节计算机辅助检测中的不平衡数据学习

Comput Med Imaging Graph. 2014 Apr;38(3):137-50. doi: 10.1016/j.compmedimag.2013.12.003. Epub 2013 Dec 21.

Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.分析不平衡数据的采样技术：一项 n = 648 的 ADNI 研究。

Neuroimage. 2014 Feb 15;87:220-41. doi: 10.1016/j.neuroimage.2013.10.005. Epub 2013 Oct 29.

BMC Bioinformatics. 2013 Jun 25;14:206. doi: 10.1186/1471-2105-14-206.

Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining.通过病理数据挖掘预测中国患者乙型肝炎病毒表面抗原的存在。

J Med Virol. 2013 Aug;85(8):1334-9. doi: 10.1002/jmv.23609.

Development and validation of a risk model for identification of non-neutropenic, critically ill adult patients at high risk of invasive Candida infection: the Fungal Infection Risk Evaluation (FIRE) Study.开发和验证一种风险模型，用于识别非中性粒细胞减少、危重症成年患者侵袭性念珠菌感染的高危人群：真菌感染风险评估（FIRE）研究。

Health Technol Assess. 2013 Feb;17(3):1-156. doi: 10.3310/hta17030.

Epidemiology of viral hepatitis and hepatocellular carcinoma.病毒性肝炎与肝细胞癌的流行病学。

Gastroenterology. 2012 May;142(6):1264-1273.e1. doi: 10.1053/j.gastro.2011.12.061.

Outlier Detection with One-Class SVMs: An Application to Melanoma Prognosis.使用单类支持向量机进行异常值检测：在黑色素瘤预后中的应用

AMIA Annu Symp Proc. 2010 Nov 13;2010:172-6.

Class prediction for high-dimensional class-imbalanced data.高维类别不平衡数据的类别预测。

BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.

A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过数据平衡和特征选择，提高支持向量机应用前不平衡常规病理数据中肝炎病毒免疫测定结果预测的准确性。

Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献