Chiu Readman, Rajan-Babu Indhu-Shree, Friedman Jan M, Birol Inanc
Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada.
Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada.
medRxiv. 2024 Jun 20:2024.06.19.24309173. doi: 10.1101/2024.06.19.24309173.
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
随着长读长测序数据、高质量人类基因组组装以及用于全面表征串联重复序列的软件越来越容易获得,在群体规模上对串联重复序列位点进行全基因组基因分型变得更加可行。这些努力不仅扩展了我们对人类基因组中串联重复序列景观的认识,还增强了我们区分致病性串联重复序列突变和良性多态性的能力。为此,我们分析了使用来自三项采用不同长读长测序技术的公共计划的数据集组装的272个基因组。在此,我们报告了一个超过1800万个串联重复序列位点的目录,其中许多位点以前未被注释。其中一些位点具有高度多态性,并且其中许多位于编码序列内。