Boutilier Justin J, Craig Tim, Sharpe Michael B, Chan Timothy C Y
Department of Mechanical and Industrial Engineering, University of Toronto, 5 King's College Road, Toronto, Ontario M5S 3G8, Canada.
Radiation Medicine Program, UHN Princess Margaret Cancer Centre, 610 University of Avenue, Toronto, Ontario M5T 2M9, Canada and Department of Radiation Oncology, University of Toronto, 148-150 College Street, Toronto, Ontario M5S 3S2, Canada.
Med Phys. 2016 Mar;43(3):1212-21. doi: 10.1118/1.4941363.
To determine how training set size affects the accuracy of knowledge-based treatment planning (KBP) models.
The authors selected four models from three classes of KBP approaches, corresponding to three distinct quantities that KBP models may predict: dose-volume histogram (DVH) points, DVH curves, and objective function weights. DVH point prediction is done using the best plan from a database of similar clinical plans; DVH curve prediction employs principal component analysis and multiple linear regression; and objective function weights uses either logistic regression or K-nearest neighbors. The authors trained each KBP model using training sets of sizes n = 10, 20, 30, 50, 75, 100, 150, and 200. The authors set aside 100 randomly selected patients from their cohort of 315 prostate cancer patients from Princess Margaret Cancer Center to serve as a validation set for all experiments. For each value of n, the authors randomly selected 100 different training sets with replacement from the remaining 215 patients. Each of the 100 training sets was used to train a model for each value of n and for each KBT approach. To evaluate the models, the authors predicted the KBP endpoints for each of the 100 patients in the validation set. To estimate the minimum required sample size, the authors used statistical testing to determine if the median error for each sample size from 10 to 150 is equal to the median error for the maximum sample size of 200.
The minimum required sample size was different for each model. The DVH point prediction method predicts two dose metrics for the bladder and two for the rectum. The authors found that more than 200 samples were required to achieve consistent model predictions for all four metrics. For DVH curve prediction, the authors found that at least 75 samples were needed to accurately predict the bladder DVH, while only 20 samples were needed to predict the rectum DVH. Finally, for objective function weight prediction, at least 10 samples were needed to train the logistic regression model, while at least 150 samples were required to train the K-nearest neighbor methodology.
In conclusion, the minimum required sample size needed to accurately train KBP models for prostate cancer depends on the specific model and endpoint to be predicted. The authors' results may provide a lower bound for more complicated tumor sites.
确定训练集大小如何影响基于知识的治疗计划(KBP)模型的准确性。
作者从三类KBP方法中选择了四个模型,分别对应KBP模型可能预测的三个不同量:剂量体积直方图(DVH)点、DVH曲线和目标函数权重。DVH点预测使用来自相似临床计划数据库中的最佳计划;DVH曲线预测采用主成分分析和多元线性回归;目标函数权重使用逻辑回归或K近邻算法。作者使用大小为n = 10、20、30、50、75、100、150和200的训练集对每个KBP模型进行训练。作者从玛格丽特公主癌症中心的315例前列腺癌患者队列中随机留出100例患者作为所有实验的验证集。对于n的每个值,作者从其余215例患者中随机有放回地选择100个不同的训练集。对于n的每个值以及每种KBT方法,使用这100个训练集中的每一个来训练一个模型。为了评估模型,作者预测了验证集中100例患者中每例患者的KBP终点。为了估计所需的最小样本量,作者使用统计检验来确定从10到150的每个样本量的中位数误差是否等于最大样本量200的中位数误差。
每个模型所需的最小样本量不同。DVH点预测方法预测膀胱的两个剂量指标和直肠的两个剂量指标。作者发现,要对所有四个指标实现一致的模型预测,需要超过200个样本。对于DVH曲线预测,作者发现至少需要75个样本才能准确预测膀胱DVH,而预测直肠DVH仅需要20个样本。最后,对于目标函数权重预测,训练逻辑回归模型至少需要10个样本,而训练K近邻方法至少需要150个样本。
总之,准确训练前列腺癌KBP模型所需的最小样本量取决于要预测的特定模型和终点。作者的结果可能为更复杂的肿瘤部位提供一个下限。