Suppr超能文献

一种基于聚类的新过采样方法,用于改善肝细胞癌患者的生存预测。

A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients.

作者信息

Santos Miriam Seoane, Abreu Pedro Henriques, García-Laencina Pedro J, Simão Adélia, Carvalho Armando

机构信息

Centre for Informatics and Systems, University of Coimbra, Pólo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal; Department of Informatics Engineering, Faculty of Sciences and Technology, University of Coimbra, Pólo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal.

Centro Universitario de la Defensa de San Javier (University Centre of Defence at the Spanish Air Force Academy), MDE-UPCT, Calle Coronel López Peña, s/n, 30720 Santiago de la Ribera, Murcia, Spain.

出版信息

J Biomed Inform. 2015 Dec;58:49-59. doi: 10.1016/j.jbi.2015.09.012. Epub 2015 Sep 28.

Abstract

Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models.

摘要

肝癌是第六大最常被诊断出的癌症,尤其是肝细胞癌(HCC)占原发性肝癌的90%以上。临床医生根据循证医学评估每位患者的治疗方案,但鉴于个体间的生物学差异,这可能并不总是适用于特定患者。多年来,针对肝细胞癌的特殊情况,一些研究一直在开发辅助临床医生决策的策略,利用计算方法(如机器学习技术)从临床数据中提取知识。然而,这些研究存在一些尚未解决的局限性:一些研究并非完全聚焦于肝细胞癌患者,其他研究有严格的应用边界,且没有一项研究考虑患者之间的异质性或缺失数据的存在,这是医疗环境中常见的缺陷。在这项工作中,我们研究了一个由异质临床特征组成的真实复杂的肝细胞癌数据库。我们提出了一种新的基于聚类的过采样方法,该方法对小数据集和不平衡数据集具有鲁棒性,考虑了肝细胞癌患者的异质性。这项工作的预处理程序基于数据插补,考虑了针对异质数据和缺失数据的适当距离度量(HEOM)以及聚类研究,以评估所研究数据集中潜在的患者群体(K均值聚类)。应用最终方法以减少规模较小的潜在患者概况对生存预测的影响。它基于K均值聚类和SMOTE算法构建一个代表性数据集,并将其用作不同机器学习程序(逻辑回归和神经网络)的训练示例。通过生存预测评估结果,并使用Friedman秩检验在不考虑聚类和/或过采样的基线方法之间进行比较。我们提出的方法与神经网络相结合的表现优于所有其他方法,表明相对于目前肝细胞癌预测模型中使用的经典方法有改进。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验