Section Genomics of Neurodegenerative Diseases and Aging, Department of Clinical Genetics, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands;
Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, 1081HV Amsterdam, The Netherlands.
Genome Res. 2024 Nov 20;34(11):1942-1953. doi: 10.1101/gr.279351.124.
Tandem repeats (TRs) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs; however, the underlying bioinformatics perspectives remain challenging. We present and TREAT: is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization, and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and Pacific Bioscience (PacBio, Sequel II and Revio), and TREAT achieve state-of-the-art genotyping and motif characterization accuracy. Applied to clinically relevant TRs, TREAT/ significantly identify individuals with pathogenic TR expansions. When applied to a case-control setting, we replicate previously reported associations of TRs with Alzheimer's disease, including those near or within ( = 2.63 × 10), ( = 6.5 × 10), and ( = 0.04) genes. Finally, we use TREAT/ to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing data sets. We show that, in rare cases (0.06%), long-read sequencing from coverage drops in TRs, including the disease-associated TRs in and genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TRs across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TRs in human genomes, with broad applications in research and clinical fields.
串联重复(TRs)在人类基因组变异和疾病风险中发挥着重要作用。长读测序允许准确描述 TRs;然而,潜在的生物信息学观点仍然具有挑战性。我们介绍了 和 TREAT: 是一种快速靶向局部组装器,可跨不同测序平台交叉兼容。它集成在 TREAT 中,这是一个用于跨多个基因组进行 TR 特征描述、可视化和分析的端到端工作流程。在与基于牛津纳米孔技术(ONT,Simplex 和 Duplex)和太平洋生物科学(PacBio,Sequel II 和 Revio)的长读测序数据的现有工具进行比较时, 和 TREAT 实现了最先进的基因分型和基序特征准确性。应用于临床相关 TR,TREAT/可显著识别具有致病性 TR 扩展的个体。当应用于病例对照设置时,我们复制了先前报道的 TR 与阿尔茨海默病的关联,包括那些靠近或位于 (=2.63×10)、 (=6.5×10)和 (=0.04)基因附近的 TR。最后,我们使用 TREAT/系统地评估了使用不同的 ONT 和 PacBio 长读测序数据集对 TR 进行基因分型时的潜在偏差。我们表明,在极少数情况下(0.06%),TR 中的覆盖度下降会导致长读测序,包括 和 基因中与疾病相关的 TR。这种覆盖度下降可能导致 TR 误基因分型,从而阻碍 TR 等位基因的准确特征描述。总之,我们的工具可以在不同的测序技术中准确地对 TR 进行基因分型,并且要求最低,允许对人类基因组中的 TR 进行端到端分析和比较,在研究和临床领域具有广泛的应用。