全基因组串联重复扩展图谱分析。

Profiling the genome-wide landscape of tandem repeat expansions.

机构信息

Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA.

Department of Medicine, University of California San Diego, 9500 Gilman Drive, MC 0639, La Jolla, CA 92093, USA.

出版信息

Nucleic Acids Res. 2019 Sep 5;47(15):e90. doi: 10.1093/nar/gkz501.

Abstract

Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington's Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS.

摘要

串联重复 (TR) 扩展与数十种遗传疾病有关,包括亨廷顿病、脆性 X 综合征和遗传性共济失调。此外,TR 最近还与一系列复杂特征有关,包括基因表达和癌症风险。尽管人类基因组中存在数十万种 TR,但 TR 扩展的分析主要限于已知的致病基因座。一个主要的挑战是,扩展的重复超出了大多数下一代测序 (NGS) 数据集的读取长度,并且现有的全基因组工具无法对其进行分析。我们提出了 GangSTR,这是一种用于短重复和扩展重复的全基因组基因分型的新算法。GangSTR 将来自配对末端读数的信息提取到一个统一的模型中,以估计最大似然 TR 长度。我们在真实和模拟数据上验证了 GangSTR,并表明 GangSTR 在准确性和速度方面均优于替代方法。我们将 GangSTR 应用于深度测序的三人组,以分析健康家庭中 TR 扩展的全景,并使用正交技术验证新的扩展。我们的分析表明,健康个体携带数十种当前全基因组方法无法捕获的长 TR 等位基因。GangSTR 可能会发现目前无法从 NGS 获得的新的与疾病相关的变异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc7/6735967/4622e6d4fcec/gkz501fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索