基于拉伸指数扩散加权成像的机器学习模型预测裸鼠乳腺癌模型中TROP2的表达

Machine Learning Models Based on Stretched-Exponential Diffusion Weighted Imaging to Predict TROP2 Expression in Nude Mouse Breast Cancer Models.

作者信息

Deng Yi, Han Chao-Gang, Deng Zi-Qin, Yang Shou-Yi, Wu Zhuo-Han, Liu Jia-Li, Ma Jia-Ming

机构信息

Department of Radiology, Shaoguan Maternal and Child Health Hospital, 512000 Shaoguan, Guangdong, China.

出版信息

Discov Med. 2025 Mar;37(194):496-502. doi: 10.24976/Discov.Med.202537194.41.

DOI:10.24976/Discov.Med.202537194.41

PMID:40116097

Abstract

BACKGROUND

Trophoblast cell surface antigen 2 (TROP2) is a promising target for various cancers, including breast cancer. The development of noninvasive techniques for assessing TROP2 expression in tumors holds considerable importance. This study aims to explore the efficacy of machine learning models based on multi-b-value diffusion-weighted imaging (DWI) using the stretched-exponential model (SEM) for predicting TROP2 expression in breast cancer in nude mouse models.

MATERIALS AND METHODS

Thirty-two nude mouse breast cancer models were subjected to 1.5T magnetic resonance imaging (MRI). Using the freely available software package FireVoxe, we extracted the distribution diffusion coefficient (DDC) and water molecule diffusion heterogeneity index (α) values from SEM, along with histogram parameters of DDC and α maps. TROP2 expression was identified by immunohistochemical staining, with integrated optical density (IOD) quantifying the expression levels. Mice were categorized into high and low TROP2 expression groups based on the median IOD. Key imaging parameters were selected to establish three machine learning models: extreme gradient boosting (XGBoost) classifier, logistic regression, and adaptive boosting (AdaBoost) classifier. We compared the models using the area under the curve (AUC) of the receiver operating characteristic (ROC) on a validation set to determine the superior model. The dataset was split into a training set (28 cases) and a test set (4 cases). The selected model was trained to optimize its performance. We evaluated the models' predictive accuracy in estimating TROP2 expression using AUC, calibration curve, and decision curve analysis (DCA).

RESULTS

Thirty-eight imaging parameters, including DDC, α value, and 36 histogram parameters, were extracted per sample. Using these, we identified eight key imaging parameters for constructing the machine learning models. The validation set AUC values for the XGBoost, logistic regression, and AdaBoost models were 0.828, 0.639, and 0.728, respectively, with XGBoost demonstrating superior prediction performance. In the training set, XGBoost achieved an AUC of 1, sensitivity of 0.911, specificity of 1, and accuracy of 0.954; each of these values was 1 in the test set. Cross-validation yielded an AUC of 0.689, sensitivity of 0.567, specificity of 0.567, and accuracy of 0.580. The calibration curve's Brier score was 0.044, indicating proximity to the ideal curve. DCA indicated favorable net benefits within a risk threshold range of 20-90%.

CONCLUSIONS

Machine learning models based on SEM show promise for predicting TROP2 expression in breast cancer in nude mouse models. Among the models, XGBoost demonstrated outstanding performance, suggesting its potential for clinical applications.

摘要

背景

滋养层细胞表面抗原2（TROP2）是包括乳腺癌在内的多种癌症的一个有前景的靶点。开发用于评估肿瘤中TROP2表达的非侵入性技术具有相当重要的意义。本研究旨在探讨基于多b值扩散加权成像（DWI）并使用拉伸指数模型（SEM）的机器学习模型在裸鼠模型中预测乳腺癌TROP2表达的效能。

材料与方法

对32只裸鼠乳腺癌模型进行1.5T磁共振成像（MRI）检查。使用免费软件包FireVoxe，我们从SEM中提取了分布扩散系数（DDC）和水分子扩散异质性指数（α）值，以及DDC和α图的直方图参数。通过免疫组织化学染色鉴定TROP2表达，用积分光密度（IOD）量化表达水平。根据IOD中位数将小鼠分为TROP2高表达组和低表达组。选择关键成像参数建立三种机器学习模型：极端梯度提升（XGBoost）分类器、逻辑回归和自适应提升（AdaBoost）分类器。我们在验证集上使用受试者操作特征（ROC）曲线下面积（AUC）比较模型，以确定最优模型。数据集被分为训练集（28例）和测试集（4例）。对所选模型进行训练以优化其性能。我们使用AUC、校准曲线和决策曲线分析（DCA）评估模型在估计TROP2表达方面的预测准确性。

结果

每个样本提取了38个成像参数，包括DDC、α值和36个直方图参数。利用这些参数，我们确定了八个关键成像参数用于构建机器学习模型。XGBoost、逻辑回归和AdaBoost模型在验证集上的AUC值分别为0.828、0.639和0.728，XGBoost表现出更好的预测性能。在训练集中，XGBoost的AUC为1，灵敏度为0.911，特异性为1，准确率为0.954；在测试集中这些值均为1。交叉验证得到的AUC为0.689，灵敏度为0.567，特异性为0.567，准确率为0.580。校准曲线的Brier评分为0.044，表明接近理想曲线。DCA表明在20 - 90%的风险阈值范围内净效益良好。