基于肿瘤-正常和肿瘤纯测序的面板衍生肿瘤突变负荷校正中,概率混合模型的改进。
Probabilistic Mixture Models Improve Calibration of Panel-derived Tumor Mutational Burden in the Context of both Tumor-normal and Tumor-only Sequencing.
机构信息
Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland.
The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
出版信息
Cancer Res Commun. 2023 Mar 28;3(3):501-509. doi: 10.1158/2767-9764.CRC-22-0339. eCollection 2023 Mar.
BACKGROUND
Tumor mutational burden (TMB) has been investigated as a biomarker for immune checkpoint blockade (ICB) therapy. Increasingly, TMB is being estimated with gene panel-based assays (as opposed to full exome sequencing) and different gene panels cover overlapping but distinct genomic coordinates, making comparisons across panels difficult. Previous studies have suggested that standardization and calibration to exome-derived TMB be done for each panel to ensure comparability. With TMB cutoffs being developed from panel-based assays, there is a need to understand how to properly estimate exomic TMB values from different panel-based assays.
DESIGN
Our approach to calibration of panel-derived TMB to exomic TMB proposes the use of probabilistic mixture models that allow for nonlinear relationships along with heteroscedastic error. We examined various inputs including nonsynonymous, synonymous, and hotspot counts along with genetic ancestry. Using The Cancer Genome Atlas cohort, we generated a tumor-only version of the panel-restricted data by reintroducing private germline variants.
RESULTS
We were able to model more accurately the distribution of both tumor-normal and tumor-only data using the proposed probabilistic mixture models as compared with linear regression. Applying a model trained on tumor-normal data to tumor-only input results in biased TMB predictions. Including synonymous mutations resulted in better regression metrics across both data types, but ultimately a model able to dynamically weight the various input mutation types exhibited optimal performance. Including genetic ancestry improved model performance only in the context of tumor-only data, wherein private germline variants are observed.
SIGNIFICANCE
A probabilistic mixture model better models the nonlinearity and heteroscedasticity of the data as compared with linear regression. Tumor-only panel data are needed to properly calibrate tumor-only panels to exomic TMB. Leveraging the uncertainty of point estimates from these models better informs cohort stratification in terms of TMB.
背景
肿瘤突变负担(TMB)已被研究作为免疫检查点阻断(ICB)治疗的生物标志物。越来越多的 TMB 是通过基于基因panel 的检测(而不是全外显子测序)来估计的,不同的基因panel 覆盖重叠但不同的基因组坐标,使得panel 之间的比较变得困难。先前的研究表明,需要对每个panel 进行基于外显子的 TMB 标准化和校准,以确保可比性。由于 TMB 截止值是基于panel 检测开发的,因此需要了解如何正确估计来自不同基于panel 的检测的外显子 TMB 值。
设计
我们对panel 衍生 TMB 到外显子 TMB 的校准方法提出了使用概率混合模型的建议,该模型允许非线性关系和异方差误差。我们检查了各种输入,包括非同义、同义突变和热点计数以及遗传起源。使用癌症基因组图谱队列,我们通过重新引入私有种系变体,为 panel 受限数据生成了仅肿瘤的版本。
结果
与线性回归相比,我们能够使用提出的概率混合模型更准确地对肿瘤-正常和仅肿瘤数据的分布进行建模。将在肿瘤-正常数据上训练的模型应用于仅肿瘤输入会导致 TMB 预测出现偏差。包括同义突变会提高两种数据类型的回归指标,但最终能够动态加权各种输入突变类型的模型表现出最佳性能。包括遗传起源仅在观察到私有种系变体的情况下才会提高模型性能,仅限于仅肿瘤数据的背景。
意义
概率混合模型与线性回归相比,更好地对数据的非线性和异方差性进行建模。需要仅肿瘤 panel 数据才能正确校准仅肿瘤 panel 到外显子 TMB。利用这些模型的点估计不确定性可以更好地告知 TMB 方面的队列分层。