Liu Qian, Zhang Peng, Wang Depeng, Gu Weihong, Wang Kai
Institute for Genomic Medicine, Columbia University, New York, NY, 10032, USA.
Nextomics Biosciences, Wuhan, Hubei, 430000, China.
Genome Med. 2017 Jul 18;9(1):65. doi: 10.1186/s13073-017-0456-7.
Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the "unsequenceable" genomic trinucleotide repeat disorders.
微卫星扩张,如三核苷酸重复扩张(TRE),已知会导致多种遗传疾病。桑格测序和新一代短读长测序无法可靠地检测TRE。我们开发了一种名为RepeatHMM的新型算法,用于从长读长测序数据中估计重复计数。对模拟数据、两种重复扩张疾病的真实扩增子测序数据以及由PacBio和牛津纳米孔技术生成的全基因组测序数据的评估表明,其性能优于其他竞争方法。我们得出结论,长读长测序与RepeatHMM相结合可以估计微卫星上的重复计数,并能够检测“无法测序”的基因组三核苷酸重复疾病。