Hu Jiuyun, Fatyga Mirek, Liu Wei, Schild Steven E, Wong William W, Vora Sujay A, Li Jing
School of Computing & Augmented Intelligence, Arizona State University, Tempe, AZ, USA.
Department of Radiation Oncology, Mayo Clinic Arizona, Phoenix, AZ, USA.
IISE Trans Healthc Syst Eng. 2024;14(2):130-140. doi: 10.1080/24725579.2023.2227199. Epub 2023 Jul 7.
Radiation therapy (RT) is a frontline approach to treating cancer. While the target of radiation dose delivery is the tumor, there is an inevitable spill of dose to nearby normal organs causing complications. This phenomenon is known as radiotherapy toxicity. To predict the outcome of the toxicity, statistical models can be built based on dosimetric variables received by the normal organ at risk (OAR), known as Normal Tissue Complication Probability (NTCP) models. To tackle the challenge of the high dimensionality of dosimetric variables and limited clinical sample sizes, statistical models with variable selection techniques are viable choices. However, existing variable selection techniques are data-driven and do not integrate medical domain knowledge into the model formulation. We propose a knowledge-constrained generalized linear model (KC-GLM). KC-GLM includes a new mathematical formulation to translate three pieces of domain knowledge into non-negativity, monotonicity, and adjacent similarity constraints on the model coefficients. We further propose an equivalent transformation of the KC-GLM formulation, which makes it possible to solve the model coefficients using existing optimization solvers. Furthermore, we compare KC-GLM and several well-known variable selection techniques a simulation study and on two real datasets of prostate cancer and lung cancer, respectively. These experiments show that KC-GLM selects variables with better interpretability, avoids producing counter-intuitive and misleading results, and has better prediction accuracy.
放射治疗(RT)是治疗癌症的一线方法。虽然辐射剂量传递的目标是肿瘤,但不可避免地会有剂量泄漏到附近的正常器官,从而引发并发症。这种现象被称为放射治疗毒性。为了预测毒性结果,可以基于处于危险中的正常器官(OAR)所接收的剂量学变量建立统计模型,即正常组织并发症概率(NTCP)模型。为应对剂量学变量的高维度和临床样本量有限的挑战,采用变量选择技术的统计模型是可行的选择。然而,现有的变量选择技术是数据驱动的,没有将医学领域知识纳入模型构建中。我们提出了一种知识约束广义线性模型(KC - GLM)。KC - GLM包含一种新的数学公式,可将三条领域知识转化为对模型系数的非负性、单调性和相邻相似性约束。我们进一步提出了KC - GLM公式的等效变换,这使得可以使用现有的优化求解器来求解模型系数。此外,我们分别在一项模拟研究以及前列腺癌和肺癌的两个真实数据集上,将KC - GLM与几种著名的变量选择技术进行了比较。这些实验表明,KC - GLM选择的变量具有更好的可解释性,避免产生违反直觉和误导性的结果,并且具有更好的预测准确性。