Álvarez-Jarreta Jorge, Ruiz-Pesini Eduardo
Depto. de Informática e Ingeniería de Sistemas (DIIS), Universidad de Zaragoza, María de Luna 1, Zaragoza, 50018, Spain.
Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Mariano Esquillor s/n, Zaragoza, 50018, Spain.
BMC Bioinformatics. 2016 Oct 28;17(1):436. doi: 10.1186/s12859-016-1303-3.
Molecular evolution studies involve many different hard computational problems solved, in most cases, with heuristic algorithms that provide a nearly optimal solution. Hence, diverse software tools exist for the different stages involved in a molecular evolution workflow.
We present MEvoLib, the first molecular evolution library for Python, providing a framework to work with different tools and methods involved in the common tasks of molecular evolution workflows. In contrast with already existing bioinformatics libraries, MEvoLib is focused on the stages involved in molecular evolution studies, enclosing the set of tools with a common purpose in a single high-level interface with fast access to their frequent parameterizations. The gene clustering from partial or complete sequences has been improved with a new method that integrates accessible external information (e.g. GenBank's features data). Moreover, MEvoLib adjusts the fetching process from NCBI databases to optimize the download bandwidth usage. In addition, it has been implemented using parallelization techniques to cope with even large-case scenarios.
MEvoLib is the first library for Python designed to facilitate molecular evolution researches both for expert and novel users. Its unique interface for each common task comprises several tools with their most used parameterizations. It has also included a method to take advantage of biological knowledge to improve the gene partition of sequence datasets. Additionally, its implementation incorporates parallelization techniques to enhance computational costs when handling very large input datasets.
分子进化研究涉及许多不同的复杂计算问题,在大多数情况下,需要使用启发式算法来解决,这些算法能提供接近最优的解决方案。因此,针对分子进化工作流程中的不同阶段,存在各种各样的软件工具。
我们展示了MEvoLib,这是首个用于Python的分子进化库,它提供了一个框架,用于处理分子进化工作流程常见任务中涉及的不同工具和方法。与现有的生物信息学库不同,MEvoLib专注于分子进化研究涉及的阶段,将具有共同目的的一组工具封装在一个单一的高级接口中,能够快速访问其常见的参数设置。通过一种整合可获取外部信息(如GenBank的特征数据)的新方法,改进了从部分或完整序列进行基因聚类的过程。此外,MEvoLib调整了从NCBI数据库的获取过程,以优化下载带宽的使用。此外,它采用并行化技术实现,以应对甚至是大型的情况。
MEvoLib是首个为Python设计的库,旨在方便专家和新用户进行分子进化研究。其针对每个常见任务的独特接口包含了多个带有最常用参数设置的工具。它还包含一种利用生物学知识来改进序列数据集基因划分的方法。此外,其实现采用了并行化技术,以在处理非常大的输入数据集时提高计算效率。