Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
Genome Biol. 2019 Mar 19;20(1):58. doi: 10.1186/s13059-019-1667-6.
Tandemly repeated DNA is highly mutable and causes at least 31 diseases, but it is hard to detect pathogenic repeat expansions genome-wide. Here, we report robust detection of human repeat expansions from careful alignments of long but error-prone (PacBio and nanopore) reads to a reference genome. Our method is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we prioritize pathogenic expansions within the top 10 out of 700,000 tandem repeats in whole genome sequencing data. This may help to elucidate the many genetic diseases whose causes remain unknown.
串联重复 DNA 高度易变,可导致至少 31 种疾病,但很难在全基因组范围内检测到致病性重复扩展。在这里,我们报告了一种稳健的方法,通过对长但易错的(PacBio 和纳米孔)reads 与参考基因组进行仔细比对,来检测人类重复扩展。我们的方法对系统测序错误、边界不精确的模糊重复和低测序覆盖率具有稳健性。通过与健康对照进行比较,我们在全基因组测序数据中 700000 个串联重复的前 10 个中优先考虑致病性扩展。这可能有助于阐明许多病因不明的遗传疾病。