Suppr超能文献

完整参考基因组在癌症结构变异分析中的益处。

The benefit of a complete reference genome for cancer structural variant analysis.

作者信息

Paulin Luis F, Fan Jeremy, O'Neill Kieran, Pleasance Erin, Porter Vanessa L, Jones Steven J M, Sedlazeck Fritz J

机构信息

Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.

Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada.

出版信息

medRxiv. 2024 Mar 18:2024.03.15.24304369. doi: 10.1101/2024.03.15.24304369.

Abstract

The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remains challenging. We hypothesized that use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumour/normal matched benchmark sample and two patient samples show that the CHM13-T2T improves SV detection and prioritization accuracy compared to GRCh38, with a notable reduction in false positive calls. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 49 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. The benchmark is available at: 10.5281/zenodo.10819636 Our work demonstrates new approaches to optimize somatic SV prioritization in cancer with potential improvements in other genetic diseases.

摘要

由于测序技术的进步和生物信息分析的改进,癌症基因组的复杂性正变得更容易解读。结构变异(SVs)是肿瘤体细胞事件的一个重要子集。虽然长读长测序的发展显著改善了SVs的检测,但体细胞变异的识别和注释仍然具有挑战性。我们假设使用完整的人类参考基因组(CHM13-T2T)将提高体细胞SVs的检测能力。我们在一个肿瘤/正常匹配的基准样本和两个患者样本中的研究结果表明,与GRCh38相比,CHM13-T2T提高了SVs的检测和优先级排序准确性,假阳性调用显著减少。我们还通过将与CHM13-T2T比对的 reads 提升到GRCh38基因组,克服了CHM13-T2T注释资源的不足,从而结合了改进的比对和先进的注释。在此过程中,我们评估了在不同中心使用不同长读长技术测序的四个重复样本中COLO829/COLO829BL的当前SV基准集。我们发现该细胞系在这些重复样本中存在不稳定性;346个SVs(1.13%)仅在单个重复样本中可检测到。我们鉴定出49个体细胞SVs,它们似乎是稳定的,因为在四个重复样本中都一致存在。因此,我们提出这个共识集作为体细胞SVs检测的更新基准,并在我们的基准中包括GRCh38和CHM13-T2T坐标。该基准可在以下网址获取:10.5281/zenodo.10819636 我们的工作展示了优化癌症中体细胞SVs优先级排序的新方法,并可能在其他遗传疾病中带来改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b919/10984048/e8d17eea1d54/nihpp-2024.03.15.24304369v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验