Elrick Hillary, Sauer Carolin M, Espejo Valle-Inclan Jose, Trevers Katherine, Tanguy Melanie, Zumalave Sonia, De Noon Solange, Muyas Francesc, Cascão Rita, Afonso Angela, Rust Alistair G, Amary Fernanda, Tirabosco Roberto, Giess Adam, Freeman Timothy, Sosinsky Alona, Piculell Katherine, Miller David T, Faria Claudia C, Elgar Greg, Flanagan Adrienne M, Cortes-Ciriano Isidro
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.
Department of Histopathology, Royal National Orthopaedic Hospital, Stanmore, UK.
Nat Methods. 2025 May 28. doi: 10.1038/s41592-025-02708-0.
Accurate detection of somatic structural variants (SVs) and somatic copy number aberrations (SCNAs) is critical to study the mutational processes underpinning cancer evolution. Here we describe SAVANA, an algorithm designed to detect somatic SVs and SCNAs at single-haplotype resolution and estimate tumor purity and ploidy using long-read sequencing data with or without a germline control sample. We also establish best practices for benchmarking SV detection algorithms across the entire genome in a data-driven manner using replication and read-backed phasing analysis. Through the analysis of matched Illumina and nanopore whole-genome sequencing data for 99 human tumor-normal pairs, we show that SAVANA has significantly higher sensitivity and 13- and 82-times-higher specificity than the second and third-best performing algorithms. Moreover, SVs reported by SAVANA are highly consistent with those detected using short-read sequencing. In summary, SAVANA enables the application of long-read sequencing to detect SVs and SCNAs reliably.
准确检测体细胞结构变异(SVs)和体细胞拷贝数畸变(SCNAs)对于研究癌症进化背后的突变过程至关重要。在此,我们描述了SAVANA,这是一种旨在以单倍型分辨率检测体细胞SVs和SCNAs,并使用有或没有种系对照样本的长读长测序数据估计肿瘤纯度和倍性的算法。我们还建立了以数据驱动的方式通过重复和读回定相分析在全基因组范围内对SV检测算法进行基准测试的最佳实践。通过对99对人类肿瘤-正常样本的匹配Illumina和纳米孔全基因组测序数据进行分析,我们表明SAVANA的灵敏度显著更高,特异性比第二和第三表现最佳的算法分别高13倍和82倍。此外,SAVANA报告的SVs与使用短读长测序检测到的SVs高度一致。总之,SAVANA能够应用长读长测序可靠地检测SVs和SCNAs。