Paulin Luis F, Fan Jeremy, O'Neill Kieran, Pleasance Erin, Porter Vanessa L, Jones Steven J M, Sedlazeck Fritz J
Human Genome Sequencing Center Baylor College of Medicine, Houston, Texas 77030, USA.
Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, British Columbia V5Z 1L3, Canada.
Genome Res. 2025 Apr 14;35(4):621-631. doi: 10.1101/gr.279352.124.
The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While the detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remain challenging. We hypothesized that the use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumor-normal matched benchmark sample and three patient samples show that the CHM13-T2T improves SV detection accuracy compared to GRCh38 with a notable reduction in false-positive calls, and thus supports improved prioritization. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 54 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. Our work demonstrates new approaches to optimize somatic SV detection in cancer with potential improvements in other genetic diseases.
由于测序技术的进步和生物信息分析的改进,癌症基因组的复杂性正变得更容易解读。结构变异(SVs)是肿瘤体细胞事件的一个重要子集。虽然长读长测序的发展显著改善了SVs的检测,但体细胞变异的识别和注释仍然具有挑战性。我们假设使用完整的人类参考基因组(CHM13-T2T)将提高体细胞SVs的检测能力。我们在肿瘤-正常匹配的基准样本和三个患者样本中的研究结果表明,与GRCh38相比,CHM13-T2T提高了SVs检测的准确性,显著减少了假阳性调用,从而支持了更好的优先级排序。我们还通过将与CHM13-T2T比对的 reads 转移到GRCh38基因组,克服了CHM13-T2T注释资源的缺乏,因此结合了改进的比对和先进的注释。在此过程中,我们评估了在不同中心使用不同长读长技术测序的四个重复样本中COLO829/COLO829BL的当前SV基准集。我们发现该细胞系在这些重复样本中存在不稳定性;346个SVs(1.13%)仅在一个重复样本中可检测到。我们识别出54个体细胞SVs,它们在四个重复样本中始终存在,似乎是稳定的。因此,我们提出这个共识集作为体细胞SVs检测的更新基准,并在我们的基准中包括GRCh38和CHM13-T2T坐标。我们的工作展示了优化癌症体细胞SVs检测的新方法,并可能在其他遗传疾病中得到改进。