Suppr超能文献

StrAuto:STRUCTURE分析的自动化与并行化

StrAuto: automation and parallelization of STRUCTURE analysis.

作者信息

Chhatre Vikram E, Emerson Kevin J

机构信息

Department of Plant Biology, University of Vermont, Burlington, Vermont, USA.

Current Address: Wyoming INBRE Bioinformatics Core, Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, USA.

出版信息

BMC Bioinformatics. 2017 Mar 24;18(1):192. doi: 10.1186/s12859-017-1593-0.

Abstract

BACKGROUND

Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters.

RESULTS

We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors.

CONCLUSION

StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .

摘要

背景

使用STRUCTURE软件进行群体结构推断已成为包括人类在内的广泛分类群的群体遗传学研究中不可或缺的一部分。不断扩大的遗传数据集规模给这种分析带来了计算挑战。尽管目前至少有一个工具采用了并行计算来减少这种分析的计算负担,但它并未完全自动化下游推断最优K值所需的重复STRUCTURE分析运行。迫切需要一种能够在高性能计算集群上部署群体结构分析的工具。

结果

我们展示了广受欢迎的Python程序StrAuto的更新版本,以简化使用并行计算的群体结构分析。StrAuto实现了一个管道,将STRUCTURE分析与Evanno ΔK分析以及使用STRUCTURE HARVESTER进行结果可视化相结合。通过基准测试,我们证明StrAuto通过在两个或更多处理器上分配运行显著减少了执行迭代STRUCTURE分析所需的计算时间。

结论

StrAuto是第一个除了实现并行计算外,还使用管道方法将STRUCTURE分析与后处理集成的工具——这种设置非常适合在计算集群上部署。StrAuto根据GNU通用公共许可证(GPL)分发,可从http://strauto.popgen.org下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be11/5366143/c4dd603681d3/12859_2017_1593_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验