压缩感知在基于重叠池测序数据计算肿瘤突变负担中的应用。

The application of compressed sensing on tumor mutation burden calculation from overlapped pooling sequencing data.

作者信息

Cui Yue, Qiao Yi, An Rongming, Pan Xuan, Tu Jing

机构信息

State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.

Monash University-Southeast University Joint Research Institute, Suzhou, 215123, China.

出版信息

BMC Bioinformatics. 2025 May 20;26(1):129. doi: 10.1186/s12859-025-06148-7.

DOI:10.1186/s12859-025-06148-7

PMID:40394464

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12090583/

Abstract

BACKGROUND

Tumor Mutation Burden (TMB) is commonly characterized as the number of non-synonymous somatic SNVs per megabase within the gene region identified through whole exon sequencing or targeted sequencing in a tumor sample. It has been statistically demonstrated that TMB was related to the ability of neoantigen production and used to predict the efficacy of immunotherapy for various types of cancers. However, screening for TMB in patients poses challenges due to the extensive labor and financial resources required for the preparation of large quantities of parallel sequencing libraries.

RESULTS

In this study, we employed compressed sensing (CS) to calculate TMB from overlapped pooling sequencing data, aiming to reduce the sequencing cost by minimizing the number of library builds. Over 90% SNPs could still be detected without a significant loss of mutation information even when the data is pooled from ten different samples. Based on this, the orthogonal matching pursuit (OMP) algorithm and the basic pursuit (BP) algorithm were used to reconstruct TMB from pooling sequencing data. The performance of these two algorithms was evaluated. The BP algorithm consistently performed well across all cases, albeit necessitating extended computational time. The OMP algorithm has been proved to be suitable for scenarios where the original matrix was sparse but it showed low overall performance. Based on an accurate calculation of TMB, we determined that the number of sequencing runs could be reduced to 0.6 times the total number of samples, resulting in a 40% reduction in sequencing cost.

CONCLUSIONS

In conclusion, we calculated TMB from overlapped pooling sequencing data utilizing compressed sensing strategy to reduce sequencing cost. Our findings confirm that the SNP calling from ten samples' pooling sequencing data is feasible. Additionally, we performed an assessment of the reconstruction efficiency of both the BP model and the OMP model.

摘要

背景

肿瘤突变负荷（TMB）通常被定义为通过对肿瘤样本进行全外显子测序或靶向测序所确定的基因区域内每兆碱基的非同义体细胞单核苷酸变异（SNV）数量。统计学表明，TMB与新抗原产生能力相关，并用于预测各种癌症的免疫治疗疗效。然而，由于制备大量平行测序文库需要大量人力和财力，对患者进行TMB筛查面临挑战。

结果

在本研究中，我们采用压缩感知（CS）从重叠混合测序数据中计算TMB，旨在通过最小化文库构建数量来降低测序成本。即使数据是从十个不同样本中混合而来，仍能检测到超过90%的单核苷酸多态性（SNP），且突变信息无显著损失。基于此，使用正交匹配追踪（OMP）算法和基追踪（BP）算法从混合测序数据中重建TMB。对这两种算法的性能进行了评估。BP算法在所有情况下表现一致良好，尽管需要较长的计算时间。OMP算法已被证明适用于原始矩阵稀疏的情况，但其整体性能较低。基于TMB的准确计算，我们确定测序运行次数可减少至样本总数的0.6倍，从而使测序成本降低40%。