Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 9600 Rockville Pike, Bethesda, MD 20896, USA.
Nucleic Acids Res. 2013 Jan 7;41(1):e22. doi: 10.1093/nar/gks881. Epub 2012 Oct 2.
Microsatellites (MSs) are DNA regions consisting of repeated short motif(s). MSs are linked to several diseases and have important biomedical applications. Thus, researchers have developed several computational tools to detect MSs. However, the currently available tools require adjusting many parameters, or depend on a list of motifs or on a library of known MSs. Therefore, two laboratories analyzing the same sequence with the same computational tool may obtain different results due to the user-adjustable parameters. Recent studies have indicated the need for a standard computational tool for detecting MSs. To this end, we applied machine-learning algorithms to develop a tool called MsDetector. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is memory- and time-efficient. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.
微卫星(MSs)是由重复短序列组成的 DNA 区域。MSs 与多种疾病有关,具有重要的生物医学应用。因此,研究人员开发了几种计算工具来检测 MSs。然而,目前可用的工具需要调整许多参数,或者依赖于一个基序列表或一个已知 MSs 的库。因此,两个实验室使用相同的计算工具分析相同的序列可能会因为用户可调整的参数而得到不同的结果。最近的研究表明,需要一个标准的计算工具来检测 MSs。为此,我们应用机器学习算法开发了一种名为 MsDetector 的工具。该系统基于隐马尔可夫模型和广义线性模型。用户无需优化 MsDetector 的参数。既不需要基序列表,也不需要已知 MSs 的库。MsDetector 具有内存高效和时间高效的特点。我们将 MsDetector 应用于几种物种。MsDetector 定位了其他广泛使用的工具发现的大多数 MSs。此外,MsDetector 还鉴定了新的 MSs。此外,该系统的假阳性率非常低,准确率高达 99%。MsDetector 有望在分析同一序列的研究中产生一致的结果。