Andrade Jorge, Berglund Lisa, Uhlén Mathias, Odeberg Jacob
Department of Biotechnology, Royal Institute of Technology (KTH), Stockholm, Sweden.
In Silico Biol. 2006;6(6):495-504.
For several applications and algorithms used in applied bioinformatics, a bottle neck in terms of computational time may arise when scaled up to facilitate analyses of large datasets and databases. Re-codification, algorithm modification or sacrifices in sensitivity and accuracy may be necessary to accommodate for limited computational capacity of single work stations. Grid computing offers an alternative model for solving massive computational problems by parallel execution of existing algorithms and software implementations. We present the implementation of a Grid-aware model for solving computationally intensive bioinformatic analyses exemplified by a blastp sliding window algorithm for whole proteome sequence similarity analysis, and evaluate the performance in comparison with a local cluster and a single workstation. Our strategy involves temporary installations of the BLAST executable and databases on remote nodes at submission, accommodating for dynamic Grid environments as it avoids the need of predefined runtime environments (preinstalled software and databases at specific Grid-nodes). Importantly, the implementation is generic where the BLAST executable can be replaced by other software tools to facilitate analyses suitable for parallelisation. This model should be of general interest in applied bioinformatics. Scripts and procedures are freely available from the authors.
对于应用生物信息学中使用的多种应用程序和算法而言,当扩大规模以促进对大型数据集和数据库进行分析时,可能会出现计算时间方面的瓶颈。为了适应单个工作站有限的计算能力,可能需要重新编码、修改算法或在灵敏度和准确性方面做出牺牲。网格计算提供了一种替代模型,通过并行执行现有算法和软件实现来解决大规模计算问题。我们展示了一种用于解决计算密集型生物信息学分析的网格感知模型的实现,以全蛋白质组序列相似性分析的blastp滑动窗口算法为例,并与本地集群和单个工作站进行性能比较评估。我们的策略包括在提交时在远程节点上临时安装BLAST可执行文件和数据库,以适应动态网格环境,因为它避免了对预定义运行时环境(特定网格节点上预先安装的软件和数据库)的需求。重要的是,该实现具有通用性,其中BLAST可执行文件可以被其他软件工具替换,以促进适合并行化的分析。这种模型在应用生物信息学中应具有普遍的意义。作者可免费提供脚本和程序。