School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae159.
In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable's superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.
在癌症基因组学中,变体调用技术已经取得了进展,但传统的平均准确率评估方法对于肿瘤突变负担等生物标志物并不适用,因为肿瘤突变负担在不同样本中差异很大,会影响免疫疗法患者的选择和阈值设置。在本研究中,我们引入了 TMBstable,这是一种创新的方法,它使用元学习框架为特定的基因组区域动态选择最佳的变体调用策略,与传统的、在全样本范围内使用统一策略的调用器区分开来。该方法首先将样本分割成窗口,并提取元特征进行聚类,然后使用预先训练的元模型为每个聚类选择合适的算法,从而解决策略与样本不匹配的问题,减少性能波动,确保在各种样本中都能保持一致的性能。我们使用模拟和真实的非小细胞肺癌和鼻咽癌样本对 TMBstable 进行了评估,并将其与先进的调用器进行了比较。评估重点是稳定性指标,如假阳性率、假阴性率、精度和召回率的方差和变异系数,涉及 300 个模拟和 106 个真实肿瘤样本。基准测试结果表明,TMBstable 在性能指标上具有最低的方差和变异系数,稳定性最佳,突出了其在分析基于计数的生物标志物方面的有效性。TMBstable 算法可在 https://github.com/hello-json/TMBstable 上访问,仅供学术使用。