运用机器学习技术对血尿患者的疾病风险进行分层，以改善诊断。

Stratifying risk of disease in haematuria patients using machine learning techniques to improve diagnostics.

作者信息

Drożdż Anna, Duggan Brian, Ruddock Mark W, Reid Cherith N, Kurth Mary Jo, Watt Joanne, Irvine Allister, Lamont John, Fitzgerald Peter, O'Rourke Declan, Curry David, Evans Mark, Boyd Ruth, Sousa Jose

机构信息

Personal Health Data Science Group, Sano - Centre for Computational Personalised Medicine - International Research Foundation, Krakow, Poland.

South Eastern Health and Social Care Trust, Ulster Hospital Dundonald, Belfast, United Kingdom.

出版信息

Front Oncol. 2024 May 8;14:1401071. doi: 10.3389/fonc.2024.1401071. eCollection 2024.

DOI:10.3389/fonc.2024.1401071

PMID:38779086

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11109371/

Abstract

BACKGROUND

Detailed and invasive clinical investigations are required to identify the causes of haematuria. Highly unbalanced patient population (predominantly male) and a wide range of potential causes make the ability to correctly classify patients and identify patient-specific biomarkers a major challenge. Studies have shown that it is possible to improve the diagnosis using multi-marker analysis, even in unbalanced datasets, by applying advanced analytical methods. Here, we applied several machine learning algorithms to classify patients from the haematuria patient cohort (HaBio) by analysing multiple biomarkers and to identify the most relevant ones.

MATERIALS AND METHODS

We applied several classification and feature selection methods (k-means clustering, decision trees, random forest with LIME explainer and CACTUS algorithm) to stratify patients into two groups: healthy (with no clear cause of haematuria) or sick (with an identified cause of haematuria e.g., bladder cancer, or infection). The classification performance of the models was compared. Biomarkers identified as important by the algorithms were also analysed in relation to their involvement in the pathological processes.

RESULTS

Results showed that a high unbalance in the datasets significantly affected the classification by random forest and decision trees, leading to the overestimation of the sick class and low model performance. CACTUS algorithm was more robust to the unbalance in the dataset. CACTUS obtained a balanced accuracy of 0.747 for both genders, 0.718 for females and 0.803 for males. The analysis showed that in the classification process for the whole dataset: microalbumin, male gender, and tPSA emerged as the most informative biomarkers. For males: age, microalbumin, tPSA, cystatin C, BTA, HAD and S100A4 were the most significant biomarkers while for females microalbumin, IL-8, pERK, and CXCL16.

CONCLUSIONS

CACTUS algorithm demonstrated improved performance compared with other methods such as decision trees and random forest. Additionally, we identified the most relevant biomarkers for the specific patient group, which could be considered in the future as novel biomarkers for diagnosis. Our results have the potential to inform future research and provide new personalised diagnostic approaches tailored directly to the needs of the individuals.

摘要

背景

需要详细且侵入性的临床研究来确定血尿的病因。患者群体高度不均衡（以男性为主）以及多种潜在病因使得正确分类患者并识别患者特异性生物标志物成为一项重大挑战。研究表明，即使在不均衡的数据集中，通过应用先进的分析方法，使用多标志物分析也有可能改善诊断。在此，我们应用了几种机器学习算法，通过分析多种生物标志物对血尿患者队列（HaBio）中的患者进行分类，并识别出最相关的生物标志物。

材料与方法

我们应用了几种分类和特征选择方法（k均值聚类、决策树、带有LIME解释器的随机森林和CACTUS算法）将患者分为两组：健康组（无明确血尿病因）或患病组（有明确血尿病因，如膀胱癌或感染）。比较了模型的分类性能。还分析了算法确定为重要的生物标志物与其在病理过程中的参与情况。

结果

结果表明，数据集中的高度不均衡显著影响了随机森林和决策树的分类，导致对患病组的高估以及模型性能较低。CACTUS算法对数据集中的不均衡更为稳健。CACTUS算法在男女两性中的平衡准确率分别为0.747、女性为0.718、男性为0.803。分析表明，在整个数据集的分类过程中：微量白蛋白、男性性别和总前列腺特异性抗原（tPSA）是信息含量最高的生物标志物。对于男性：年龄、微量白蛋白、tPSA、胱抑素C、膀胱肿瘤抗原（BTA）、羟基脲还原酶（HAD）和S100A4是最显著的生物标志物，而对于女性则是微量白蛋白、白细胞介素-8（IL-8）、磷酸化细胞外信号调节激酶（pERK）和CXC趋化因子配体16（CXCL16）。

结论

与决策树和随机森林等其他方法相比，CACTUS算法表现出更好的性能。此外，我们为特定患者群体识别出了最相关的生物标志物，未来可将其视为新型诊断生物标志物。我们的结果有可能为未来的研究提供信息，并提供直接针对个体需求的新的个性化诊断方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3421/11109371/5327d3ac28f7/fonc-14-1401071-g001.jpg

相似文献

Stratifying risk of disease in haematuria patients using machine learning techniques to improve diagnostics.运用机器学习技术对血尿患者的疾病风险进行分层，以改善诊断。

Front Oncol. 2024 May 8;14:1401071. doi: 10.3389/fonc.2024.1401071. eCollection 2024.

Biomarkers to assess the risk of bladder cancer in patients presenting with haematuria are gender-specific.用于评估血尿患者膀胱癌风险的生物标志物具有性别特异性。

Front Oncol. 2022 Sep 23;12:1009014. doi: 10.3389/fonc.2022.1009014. eCollection 2022.

Diagnostic tests and algorithms used in the investigation of haematuria: systematic reviews and economic evaluation.用于血尿调查的诊断测试和算法：系统评价与经济评估

Health Technol Assess. 2006 Jun;10(18):iii-iv, xi-259. doi: 10.3310/hta10180.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在（放化疗）治疗结果预测中的应用：分类器的实证比较。

Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees.使用平衡随机树和提升 C5.0 决策树对可生物降解物质进行分类。

Int J Environ Res Public Health. 2020 Dec 13;17(24):9322. doi: 10.3390/ijerph17249322.

A random forest based biomarker discovery and power analysis framework for diagnostics research.基于随机森林的生物标志物发现和诊断研究功效分析框架。

BMC Med Genomics. 2020 Nov 23;13(1):178. doi: 10.1186/s12920-020-00826-6.

A warning machine learning algorithm for early knee osteoarthritis structural progressor patient screening.一种用于早期膝关节骨关节炎结构进展患者筛查的预警机器学习算法。

Ther Adv Musculoskelet Dis. 2021 Feb 23;13:1759720X21993254. doi: 10.1177/1759720X21993254. eCollection 2021.

Identifying the presence and severity of dementia by applying interpretable machine learning techniques on structured clinical records.通过在结构化临床记录上应用可解释的机器学习技术来识别痴呆的存在和严重程度。

BMC Med Inform Decis Mak. 2022 Oct 17;22(1):271. doi: 10.1186/s12911-022-02004-3.

本文引用的文献

Abandoning testing for asymptomatic microscopic haematuria in Sweden - a long-term follow-up.在瑞典放弃对无症状性镜下血尿的检测 - 一项长期随访研究。

Scand J Urol. 2023 Nov 21;58:109-114. doi: 10.2340/sju.v58.11142.

Urinary IL-6 and IL-8 as predictive markers in bladder urothelial carcinoma: A pilot study.尿白细胞介素-6和白细胞介素-8作为膀胱尿路上皮癌的预测标志物：一项初步研究。

Cancer Cytopathol. 2024 Jan;132(1):50-59. doi: 10.1002/cncy.22767. Epub 2023 Oct 9.

Urinary albumin excretion and cancer risk: the PREVEND cohort study.尿白蛋白排泄与癌症风险：PREVEND 队列研究。

Nephrol Dial Transplant. 2023 Nov 30;38(12):2723-2732. doi: 10.1093/ndt/gfad107.

Biomarkers to assess the risk of bladder cancer in patients presenting with haematuria are gender-specific.用于评估血尿患者膀胱癌风险的生物标志物具有性别特异性。

Front Oncol. 2022 Sep 23;12:1009014. doi: 10.3389/fonc.2022.1009014. eCollection 2022.

EGF-induced nuclear translocation of SHCBP1 promotes bladder cancer progression through inhibiting RACGAP1-mediated RAC1 inactivation.表皮生长因子诱导 SHCBP1 核转位通过抑制 RACGAP1 介导的 RAC1 失活促进膀胱癌进展。

Cell Death Dis. 2022 Jan 10;13(1):39. doi: 10.1038/s41419-021-04479-w.

Ranking Biomarkers of Aging by Citation Profiling and Effort Scoring.通过引用分析和工作量评分对衰老生物标志物进行排名

Front Genet. 2021 May 21;12:686320. doi: 10.3389/fgene.2021.686320. eCollection 2021.

Age-specific reference ranges of prostate-specific antigen in the elderly of Amirkola: A population-based study.阿米科拉老年人前列腺特异性抗原的年龄特异性参考范围：一项基于人群的研究。

Asian J Urol. 2021 Apr;8(2):183-188. doi: 10.1016/j.ajur.2020.03.001. Epub 2020 Mar 7.

Establishing a Urine-Based Biomarker Assay for Prostate Cancer Risk Stratification.建立基于尿液的生物标志物检测方法用于前列腺癌风险分层。

Front Cell Dev Biol. 2020 Dec 10;8:597961. doi: 10.3389/fcell.2020.597961. eCollection 2020.

A survey on artificial intelligence approaches in supporting frontline workers and decision makers for the COVID-19 pandemic.关于人工智能方法在支持一线工作者和决策者应对新冠疫情方面的一项调查。

Chaos Solitons Fractals. 2020 Dec;141:110337. doi: 10.1016/j.chaos.2020.110337. Epub 2020 Oct 10.

Microhematuria: AUA/SUFU Guideline.微量血尿：AUA/SUFU 指南。

J Urol. 2020 Oct;204(4):778-786. doi: 10.1097/JU.0000000000001297. Epub 2020 Jul 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

运用机器学习技术对血尿患者的疾病风险进行分层，以改善诊断。

Stratifying risk of disease in haematuria patients using machine learning techniques to improve diagnostics.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景

材料与方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献