Angelioudaki Ioanna, Iosif Angeliki, Kourou Konstadina, Tzingounis Alexandros-Georgios, Kigka Vassiliki, Skreka Androniki-Maria, Costopoulos Myrto, Memos Nikolaos, Kataki Agapi, Konstadoulakis Manousos M, Fotiadis Dimitrios I
2nd Department of Surgery, Aretaieion Hospital, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece.
Unit of Medical Technology and Intelligent Information Systems Department of Materials Science and Engineering, University of Ioannina and Biomedical Research Institute, Foundation for Research & Technology - Hellas (FORTH), Ioannina, Greece.
Front Oncol. 2025 Apr 25;15:1540195. doi: 10.3389/fonc.2025.1540195. eCollection 2025.
Pancreatic cancer (PC) is a lethal disease developing from either exocrine or endocrine cells. Efforts to assist early diagnosis focus on liquid biopsy methods, and especially on the detection of Extracellular Vesicles (EVs) secreted from cancer cells in their microenvironment and accumulated in systemic circulation. Multiple studies explore how EVs size, surface biomarkers or content can determine their unique role and function in the recipient cell's gene expression, metabolism and behavior affecting cancer development. This study aimed to develop a machine learning-driven (ML) pipeline utilizing clinical variables and EV-based features to predict the presence of pancreatic tumors of different nature (exocrine/endocrine) in patients' plasma compared to patients with benign lesions or age-matched non-oncological patients.
All available plasma samples (N=126) and variables were collected prior to surgery. EVs were detected and characterized by flow cytometry-immunostaining. Data including size and a unique set of biomarkers (CD45, CD63 and EphA2) were combined with hematological/biochemical data and processed under two use cases, each formulated as a 3-class classification problem for patient risk stratification. The first use case aimed at classifying patients as with benign lesions or exocrine/endocrine neoplasms. The second use case aimed to distinguish patients with exocrine/endocrine neoplasms from non-oncological patients. Various ML methods were applied, including Logistic Regression, Random Forest, Support Vector Machines, and Extreme Gradient Boosting. Evaluation metrics, as area under the receiver operating characteristic curve (AUC-ROC), were computed, and Shapley values were utilized to determine features with the greatest impact on the discrimination of outcome groups.
Analyses identified hematological and biochemical features, among significant predictors. Models demonstrated substantial accuracy and AUC-ROC values based on plasma EVs subpopulations, which scored over 0.90 in accuracy of the Random Forest and XGBoost algorithms, presenting 0.96 +/- 0.03 accuracy in the first use case and 0.93 +/- 0.04 in the second.
By leveraging advanced analytical ML-driven approaches and integrating diverse data types, this study achieved significant accuracy, assisting patient's risk estimation and supporting the feasibility for early detection of pancreatic cancer. Going beyond currently used biomarkers such as CEA, or CA19.9, EV-based features represent an added value offering increased diagnostic capacity.
胰腺癌(PC)是一种由外分泌细胞或内分泌细胞发展而来的致命疾病。早期诊断的努力主要集中在液体活检方法上,特别是检测癌细胞在其微环境中分泌并积累在全身循环中的细胞外囊泡(EVs)。多项研究探讨了EVs的大小、表面生物标志物或内容物如何决定它们在受体细胞基因表达、代谢和影响癌症发展的行为中的独特作用和功能。本研究旨在开发一种机器学习驱动(ML)的流程,利用临床变量和基于EVs的特征来预测患者血浆中不同性质(外分泌/内分泌)胰腺肿瘤的存在,并与良性病变患者或年龄匹配的非肿瘤患者进行比较。
在手术前收集所有可用的血浆样本(N = 126)和变量。通过流式细胞术免疫染色检测和表征EVs。将包括大小和一组独特生物标志物(CD45、CD63和EphA2)的数据与血液学/生化数据相结合,并在两个用例下进行处理,每个用例都被制定为一个用于患者风险分层的3类分类问题。第一个用例旨在将患者分类为患有良性病变或外分泌/内分泌肿瘤。第二个用例旨在区分患有外分泌/内分泌肿瘤的患者与非肿瘤患者。应用了各种ML方法,包括逻辑回归、随机森林、支持向量机和极端梯度提升算法。计算评估指标,如受试者操作特征曲线下面积(AUC-ROC),并利用Shapley值来确定对结果组区分影响最大的特征。
分析确定了血液学和生化特征是重要的预测因素。基于血浆EVs亚群的模型显示出较高的准确性和AUC-ROC值,随机森林和XGBoost算法的准确性得分超过0.90,在第一个用例中准确率为0.96±0.03,在第二个用例中为0.93±0.04。
通过利用先进的分析ML驱动方法并整合多种数据类型,本研究取得了显著的准确性,有助于患者风险评估,并支持胰腺癌早期检测的可行性。基于EVs的特征超越了目前使用的生物标志物,如CEA或CA19.9,代表了一种附加值,提供了更高的诊断能力。