Department of Computer Engineering, Hitit University Çorum, Çorum, 19030, Türkiye.
BMC Med Inform Decis Mak. 2024 Sep 5;24(1):248. doi: 10.1186/s12911-024-02657-2.
Pancreatic ductal adenocarcinoma (PDAC) is considered a highly lethal cancer due to its advanced stage diagnosis. The five-year survival rate after diagnosis is less than 10%. However, if diagnosed early, the five-year survival rate can reach up to 70%. Early diagnosis of PDAC can aid treatment and improve survival rates by taking necessary precautions. The challenge is to develop a reliable, data privacy-aware machine learning approach that can accurately diagnose pancreatic cancer with biomarkers.
The study aims to diagnose a patient's pancreatic cancer while ensuring the confidentiality of patient records. In addition, the study aims to guide researchers and clinicians in developing innovative methods for diagnosing pancreatic cancer.
Machine learning, a branch of artificial intelligence, can identify patterns by analyzing large datasets. The study pre-processed a dataset containing urine biomarkers with operations such as filling in missing values, cleaning outliers, and feature selection. The data was encrypted using the Fernet encryption algorithm to ensure confidentiality. Ten separate machine learning models were applied to predict individuals with PDAC. Performance metrics such as F1 score, recall, precision, and accuracy were used in the modeling process.
Among the 590 clinical records analyzed, 199 (33.7%) belonged to patients with pancreatic cancer, 208 (35.3%) to patients with non-cancerous pancreatic disorders (such as benign hepatobiliary disease), and 183 (31%) to healthy individuals. The LGBM algorithm showed the highest efficiency by achieving an accuracy of 98.8%. The accuracy of the other algorithms ranged from 98 to 86%. In order to understand which features are more critical and which data the model is based on, the analysis found that the features "plasma_CA19_9", REG1A, TFF1, and LYVE1 have high importance levels. The LIME analysis also analyzed which features of the model are important in the decision-making process.
This research outlines a data privacy-aware machine learning tool for predicting PDAC. The results show that a promising approach can be presented for clinical application. Future research should expand the dataset and focus on validation by applying it to various populations.
胰腺导管腺癌(PDAC)由于其晚期诊断,被认为是一种高度致命的癌症。诊断后的五年生存率不到 10%。然而,如果早期诊断,五年生存率可达 70%。早期诊断 PDAC 可以通过采取必要的预防措施来帮助治疗和提高生存率。挑战在于开发一种可靠的、具有数据隐私意识的机器学习方法,以便使用生物标志物准确诊断胰腺癌。
本研究旨在诊断患者的胰腺癌,同时确保患者记录的机密性。此外,本研究旨在为研究人员和临床医生开发诊断胰腺癌的创新方法提供指导。
机器学习是人工智能的一个分支,可以通过分析大数据集来识别模式。本研究对包含尿液生物标志物的数据集进行了预处理,包括填充缺失值、清理异常值和特征选择等操作。使用 Fernet 加密算法对数据进行加密,以确保机密性。应用十种独立的机器学习模型来预测 PDAC 患者。在建模过程中使用了 F1 分数、召回率、精度和准确率等性能指标。
在分析的 590 份临床记录中,199 份(33.7%)属于胰腺癌患者,208 份(35.3%)属于非癌性胰腺疾病患者(如良性肝胆疾病),183 份(31%)属于健康个体。LGBM 算法的效率最高,准确率为 98.8%。其他算法的准确率在 98%到 86%之间。为了了解哪些特征更关键以及模型基于哪些数据,分析发现“血浆_CA19_9”、REG1A、TFF1 和 LYVE1 等特征具有较高的重要性水平。LIME 分析还分析了模型在决策过程中哪些特征是重要的。
本研究概述了一种具有数据隐私意识的机器学习工具,用于预测 PDAC。结果表明,一种有前途的方法可以为临床应用提供参考。未来的研究应该扩大数据集,并专注于通过将其应用于不同人群来进行验证。