Suppr超能文献

运用机器学习技术对血尿患者的疾病风险进行分层,以改善诊断。

Stratifying risk of disease in haematuria patients using machine learning techniques to improve diagnostics.

作者信息

Drożdż Anna, Duggan Brian, Ruddock Mark W, Reid Cherith N, Kurth Mary Jo, Watt Joanne, Irvine Allister, Lamont John, Fitzgerald Peter, O'Rourke Declan, Curry David, Evans Mark, Boyd Ruth, Sousa Jose

机构信息

Personal Health Data Science Group, Sano - Centre for Computational Personalised Medicine - International Research Foundation, Krakow, Poland.

South Eastern Health and Social Care Trust, Ulster Hospital Dundonald, Belfast, United Kingdom.

出版信息

Front Oncol. 2024 May 8;14:1401071. doi: 10.3389/fonc.2024.1401071. eCollection 2024.

Abstract

BACKGROUND

Detailed and invasive clinical investigations are required to identify the causes of haematuria. Highly unbalanced patient population (predominantly male) and a wide range of potential causes make the ability to correctly classify patients and identify patient-specific biomarkers a major challenge. Studies have shown that it is possible to improve the diagnosis using multi-marker analysis, even in unbalanced datasets, by applying advanced analytical methods. Here, we applied several machine learning algorithms to classify patients from the haematuria patient cohort (HaBio) by analysing multiple biomarkers and to identify the most relevant ones.

MATERIALS AND METHODS

We applied several classification and feature selection methods (k-means clustering, decision trees, random forest with LIME explainer and CACTUS algorithm) to stratify patients into two groups: healthy (with no clear cause of haematuria) or sick (with an identified cause of haematuria e.g., bladder cancer, or infection). The classification performance of the models was compared. Biomarkers identified as important by the algorithms were also analysed in relation to their involvement in the pathological processes.

RESULTS

Results showed that a high unbalance in the datasets significantly affected the classification by random forest and decision trees, leading to the overestimation of the sick class and low model performance. CACTUS algorithm was more robust to the unbalance in the dataset. CACTUS obtained a balanced accuracy of 0.747 for both genders, 0.718 for females and 0.803 for males. The analysis showed that in the classification process for the whole dataset: microalbumin, male gender, and tPSA emerged as the most informative biomarkers. For males: age, microalbumin, tPSA, cystatin C, BTA, HAD and S100A4 were the most significant biomarkers while for females microalbumin, IL-8, pERK, and CXCL16.

CONCLUSIONS

CACTUS algorithm demonstrated improved performance compared with other methods such as decision trees and random forest. Additionally, we identified the most relevant biomarkers for the specific patient group, which could be considered in the future as novel biomarkers for diagnosis. Our results have the potential to inform future research and provide new personalised diagnostic approaches tailored directly to the needs of the individuals.

摘要

背景

需要详细且侵入性的临床研究来确定血尿的病因。患者群体高度不均衡(以男性为主)以及多种潜在病因使得正确分类患者并识别患者特异性生物标志物成为一项重大挑战。研究表明,即使在不均衡的数据集中,通过应用先进的分析方法,使用多标志物分析也有可能改善诊断。在此,我们应用了几种机器学习算法,通过分析多种生物标志物对血尿患者队列(HaBio)中的患者进行分类,并识别出最相关的生物标志物。

材料与方法

我们应用了几种分类和特征选择方法(k均值聚类、决策树、带有LIME解释器的随机森林和CACTUS算法)将患者分为两组:健康组(无明确血尿病因)或患病组(有明确血尿病因,如膀胱癌或感染)。比较了模型的分类性能。还分析了算法确定为重要的生物标志物与其在病理过程中的参与情况。

结果

结果表明,数据集中的高度不均衡显著影响了随机森林和决策树的分类,导致对患病组的高估以及模型性能较低。CACTUS算法对数据集中的不均衡更为稳健。CACTUS算法在男女两性中的平衡准确率分别为0.747、女性为0.718、男性为0.803。分析表明,在整个数据集的分类过程中:微量白蛋白、男性性别和总前列腺特异性抗原(tPSA)是信息含量最高的生物标志物。对于男性:年龄、微量白蛋白、tPSA、胱抑素C、膀胱肿瘤抗原(BTA)、羟基脲还原酶(HAD)和S100A4是最显著的生物标志物,而对于女性则是微量白蛋白、白细胞介素-8(IL-8)、磷酸化细胞外信号调节激酶(pERK)和CXC趋化因子配体16(CXCL16)。

结论

与决策树和随机森林等其他方法相比,CACTUS算法表现出更好的性能。此外,我们为特定患者群体识别出了最相关的生物标志物,未来可将其视为新型诊断生物标志物。我们的结果有可能为未来的研究提供信息,并提供直接针对个体需求的新的个性化诊断方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3421/11109371/5327d3ac28f7/fonc-14-1401071-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验