Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, M&D Tower 24F, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
BMC Med Genomics. 2021 Jan 7;14(1):17. doi: 10.1186/s12920-020-00853-3.
Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats.
We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci.
We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5'UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes.
We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
串联重复序列高度可变,通过多种机制导致人类疾病的发生。很难预测哪些串联重复序列可能导致疾病。一种假设是,易变的串联重复序列是遗传疾病的来源,因为在健康个体中,致病重复序列是多态性的。然而,目前尚不清楚致病重复序列是否比其他重复序列更具多态性。
我们使用 21 个人的公开长读基因组测序数据,对数百个人类串联重复进行了全基因组调查。我们使用串联基因型来测量串联重复拷贝数的变化。将已知与疾病相关的重复序列的长度变化与其他重复基因座进行比较。
我们发现,已知的孟德尔致病或与疾病相关的重复序列,特别是 CAG 和 5'UTR GGC 重复序列,在普通人群中相对较长且多态性较高。我们还表明,两个致病串联重复(ATXN3 和 GLS)的重复长度与附近的 GWAS SNP 基因型相关。
我们从长读测序数据提供了一个涵盖各种重复单元长度和序列的多态性串联重复序列目录。这种方法,特别是在全基因组关联研究中使用时,可能会提示人类基因组中潜在的致病或具有生物学意义的串联重复序列的新候选者。