QIMR Berghofer Medical Research Institute, Brisbane, Australia.
Faculty of Medicine, The University of Queensland, Brisbane, Australia.
BMC Genomics. 2024 Sep 30;25(1):898. doi: 10.1186/s12864-024-10792-3.
Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and seven structural variant callers comprised of generic callers (SVIM, Sniffles2, DELLY in generic mode and cuteSV) and somatic callers (Severus, SAVANA, nanomonsv and DELLY in somatic modes).
Different combinations of aligners and variant callers influenced somatic structural variant detection. The choice of caller had a significant influence on somatic structural variant detection in terms of variant type, size, sensitivity, and accuracy. The performance of each variant caller was assessed by comparing to somatic structural variants identified by short-read sequencing. When compared to somatic structural variants detected with short-read sequencing, more events were detected with long-read sequencing. The mean recall of somatic variant events identified by long-read sequencing was higher for the somatic callers (72%) than generic callers (53%). Among the somatic callers when using the minimap2 aligner, SAVANA and Severus achieved the highest recall at 79.5% and 79.25% respectively, followed by nanomonsv with a recall of 72.5%.
Long-read sequencing can identify somatic structural variants in clincal samples. The longer reads have the potential to improve our understanding of cancer development and inform personalized cancer treatment.
肺癌是一种异质性疾病,也是全球癌症相关死亡的主要原因。体细胞突变,包括大片段结构变异,是肺癌中选择靶向治疗的重要生物标志物。肺癌的基因组研究已经使用短读长测序进行。新兴的长读长测序技术是研究体细胞结构变异的一种很有前途的替代方法,但目前尚无关于如何处理数据和调用体细胞事件的共识。在这项研究中,我们使用长读长和短读长测序对肺癌和匹配的非肿瘤样本进行了全基因组测序,全面基准测试了由通用调用程序(SVIM、Sniffles2、DELLY 在通用模式下和 cuteSV)和体细胞调用程序(Severus、SAVANA、nanomonsv 和 DELLY 在体细胞模式下)组成的三种序列比对程序和七种结构变异调用程序。
不同的比对程序和变异调用程序组合会影响体细胞结构变异的检测。调用程序的选择会对体细胞结构变异的检测产生重大影响,包括变异类型、大小、灵敏度和准确性。通过将每个变异调用程序与短读长测序鉴定的体细胞结构变异进行比较,评估了每个变异调用程序的性能。与短读长测序检测到的体细胞结构变异相比,长读长测序检测到更多的事件。使用长读长测序鉴定的体细胞变异事件的召回率,体细胞调用程序(72%)明显高于通用调用程序(53%)。在使用 minimap2 比对程序时,SAVANA 和 Severus 的召回率最高,分别为 79.5%和 79.25%,其次是召回率为 72.5%的 nanomonsv。
长读长测序可以在临床样本中鉴定体细胞结构变异。更长的读长有可能提高我们对癌症发展的理解,并为个性化癌症治疗提供信息。