Department of Biology; Texas A&M University; College Station, TX 77843.
Department of Biology; Texas A&M University; College Station, TX 77843
G3 (Bethesda). 2019 Oct 7;9(10):3101-3104. doi: 10.1534/g3.119.400335.
Microsatellites are repetitive DNA sequences usually found in non-coding regions of the genome. Their quantification and analysis have applications in fields from population genetics to evolutionary biology. As genome assemblies become commonplace, the need for software that can facilitate analyses has never been greater. In particular, R packages that can analyze genomic data are particularly important since this is one of the most popular software environments for biologists. We created an R package, micRocounter, to quantify microsatellites. We have optimized our package for speed, accessibility, and portability, making the automated analysis of large genomic data sets feasible. Computationally intensive algorithms were built in C++ to increase speed. Tests using benchmark datasets show a 200-fold improvement in speed over existing software. A moderately sized genome of 500 Mb can be processed in under 50 sec. Results are output as an object in R increasing accessibility and flexibility for practitioners.
微卫星是通常存在于基因组非编码区域的重复 DNA 序列。它们的定量和分析在从群体遗传学到进化生物学等领域都有应用。随着基因组组装变得普遍,人们对能够促进分析的软件的需求从未如此之大。特别是能够分析基因组数据的 R 包尤为重要,因为这是生物学家最常用的软件环境之一。我们创建了一个名为 micRocounter 的 R 包来定量微卫星。我们已经针对速度、可访问性和可移植性对我们的软件进行了优化,从而实现了对大型基因组数据集的自动化分析。我们使用 C++构建了计算密集型算法来提高速度。使用基准数据集进行的测试表明,与现有软件相比,速度提高了 200 倍。可以在不到 50 秒的时间内处理大小为 500Mb 的中等基因组。结果以 R 中的对象输出,增加了从业人员的可访问性和灵活性。