Tang Haixu, Nzabarushimana Etienne
School of Informatics and Computing, Indiana University, 150 S. Woodlawn Avenue, Bloomington, 47405, IN, USA.
BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):398. doi: 10.1186/s12859-017-1800-z.
Short tandem repeats (STRs) are found in many prokaryotic and eukaryotic genomes, and are commonly used as genetic markers, in particular for identity and parental testing in DNA forensics. The unstable expansion of some STRs was associated with various genetic disorders (e.g., the Huntington disease), and thus was used in genetic testing for screening individuals at high risk. Traditional STR analyses were based on the PCR amplification of STR loci followed by gel electrophoresis. With the availability of massive whole genome sequencing data, it becomes practical to mine STR profiles in silico from genome sequences. Software tools such as lobSTR and STR-FM have been developed to address these demands, which are, however, built upon whole genome reads mapping tools, and thus may not be sensitive enough.
In this paper, we present a standalone software tool STRScan that uses a greedy algorithm for targeted STR profiling in next-generation sequencing (NGS) data. STRScan was tested on the whole genome sequencing data from Venter genome sequencing and 1000 Genomes Project. The results showed that STRScan can profile 20% more STRs in the target set that are missed by lobSTR.
STRScan is particularly useful for the NGS-based targeted STR profiling, e.g., in genetic and human identity testing. STRScan is available as open-source software at http://darwin.informatics.indiana.edu/str/ .
短串联重复序列(STRs)存在于许多原核生物和真核生物基因组中,常用于作为遗传标记,尤其是在DNA法医鉴定中的身份识别和亲权测试。一些STRs的不稳定扩增与多种遗传疾病(如亨廷顿舞蹈症)相关,因此被用于遗传检测以筛查高危个体。传统的STR分析基于STR位点的PCR扩增,随后进行凝胶电泳。随着大量全基因组测序数据的可得性,从基因组序列中进行STR图谱的计算机挖掘变得可行。诸如lobSTR和STR-FM等软件工具已被开发以满足这些需求,然而,它们是基于全基因组读段映射工具构建的,因此可能不够灵敏。
在本文中,我们展示了一个独立的软件工具STRScan,它使用贪心算法在下一代测序(NGS)数据中进行靶向STR图谱分析。STRScan在来自文特尔基因组测序和千人基因组计划的全基因组测序数据上进行了测试。结果表明,STRScan能够在目标集合中识别出比lobSTR多20%的STRs。
STRScan对于基于NGS的靶向STR图谱分析特别有用,例如在遗传和人类身份测试中。STRScan作为开源软件可在http://darwin.informatics.indiana.edu/str/获取。