Suppr超能文献

从下一代测序数据中进行体细胞和种系变异调用。

Somatic and Germline Variant Calling from Next-Generation Sequencing Data.

机构信息

Center for Applied Bioinformatics, St Jude Children's Research Hospital, Memphis, TN, USA.

出版信息

Adv Exp Med Biol. 2022;1361:37-54. doi: 10.1007/978-3-030-91836-1_3.

Abstract

Re-sequencing of the human genome by next-generation sequencing (NGS) has been widely applied to discover pathogenic genetic variants and/or causative genes accounting for various types of diseases including cancers. The advances in NGS have allowed the sequencing of the entire genome of patients and identification of disease-associated variants in a reasonable timeframe and cost. The core of the variant identification relies on accurate variant calling and annotation. Numerous algorithms have been developed to elucidate the repertoire of somatic and germline variants. Each algorithm has its own distinct strengths, weaknesses, and limitations due to the difference in the statistical modeling approach adopted and read information utilized. Accurate variant calling remains challenging due to the presence of sequencing artifacts and read misalignments. All of these can lead to the discordance of the variant calling results and even misinterpretation of the discovery. For somatic variant detection, multiple factors including chromosomal abnormalities, tumor heterogeneity, tumor-normal cross contaminations, unbalanced tumor/normal sample coverage, and variants with low allele frequencies add even more layers of complexity to accurate variant identification. Given the discordances and difficulties, ensemble approaches have emerged by harmonizing information from different algorithms to improve variant calling performance. In this chapter, we first introduce the general scheme of variant calling algorithms and potential challenges at distinct stages. We next review the existing workflows of variant calling and annotation, and finally explore the strategies deployed by different callers as well as their strengths and caveats. Overall, NGS-based variant identification with careful consideration allows reliable detection of pathogenic variant and candidate variant selection for precision medicine.

摘要

下一代测序(NGS)对人类基因组的重测序已被广泛应用于发现导致各种疾病(包括癌症)的致病性遗传变异体和/或致病基因。NGS 的进步使得在合理的时间和成本内对患者的整个基因组进行测序并识别与疾病相关的变异体成为可能。变异体识别的核心依赖于准确的变异体调用和注释。已经开发了许多算法来阐明体细胞和种系变异体的 repertoire。由于所采用的统计建模方法和所使用的读取信息不同,每个算法都有其自身的独特优势、弱点和局限性。由于测序伪影和读取不对齐的存在,准确的变异体调用仍然具有挑战性。所有这些都可能导致变异体调用结果的不一致,甚至导致发现结果的误解。对于体细胞变异体检测,多种因素(包括染色体异常、肿瘤异质性、肿瘤-正常交叉污染、肿瘤/正常样本覆盖的不平衡以及等位基因频率低的变异体)为准确的变异体识别增加了更多的复杂性。鉴于存在差异和困难,通过协调来自不同算法的信息,已经出现了集成方法,以提高变异体调用性能。在本章中,我们首先介绍了变异体调用算法的一般方案和各个阶段的潜在挑战。接下来,我们回顾了现有的变异体调用和注释工作流程,最后探讨了不同调用者所采用的策略及其优缺点。总之,通过仔细考虑基于 NGS 的变异体识别可以可靠地检测致病性变异体,并为精准医学选择候选变异体。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验