Pitman Derek J, Schenkelberg Christian D, Huang Yao-Ming, Teets Frank D, DiTursi Daniel, Bystroff Christopher
Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
Bioinformatics. 2014 Apr 15;30(8):1138-1145. doi: 10.1093/bioinformatics/btt735. Epub 2013 Dec 25.
Accuracy in protein design requires a fine-grained rotamer search, multiple backbone conformations, and a detailed energy function, creating a burden in runtime and memory requirements. A design task may be split into manageable pieces in both three-dimensional space and in the rotamer search space to produce small, fast jobs that are easily distributed. However, these jobs must overlap, presenting a problem in resolving conflicting solutions in the overlap regions.
Piecemeal design, in which the design space is split into overlapping regions and rotamer search spaces, accelerates the design process whether jobs are run in series or in parallel. Large jobs that cannot fit in memory were made possible by splitting. Accepting the consensus amino acid selection in conflict regions led to non-optimal choices. Instead, conflicts were resolved using a second pass, in which the split regions were re-combined and designed as one, producing results that were closer to optimal with a minimal increase in runtime over the consensus strategy. Splitting the search space at the rotamer level instead of at the amino acid level further improved the efficiency by reducing the search space in the second pass.
Programs for splitting protein design expressions are available at www.bioinfo.rpi.edu/tools/piecemeal.html CONTACT: bystrc@rpi.edu Supplementary information: Supplementary data are available at Bioinformatics online.
蛋白质设计的准确性需要精细的旋转异构体搜索、多种主链构象以及详细的能量函数,这在运行时和内存需求方面造成了负担。一个设计任务可以在三维空间和旋转异构体搜索空间中分解为可管理的部分,以产生易于分发的小而快速的任务。然而,这些任务必须重叠,这在解决重叠区域中相互冲突的解决方案时带来了问题。
零碎设计,即将设计空间划分为重叠区域和旋转异构体搜索空间,无论任务是串行运行还是并行运行,都能加速设计过程。通过拆分使得无法装入内存的大型任务成为可能。接受冲突区域中的一致氨基酸选择会导致非最优选择。相反,通过第二轮处理来解决冲突,在第二轮处理中,将拆分的区域重新组合并作为一个整体进行设计,在运行时比一致策略略有增加的情况下产生更接近最优的结果。在旋转异构体级别而非氨基酸级别拆分搜索空间,通过减少第二轮处理中的搜索空间进一步提高了效率。
用于拆分蛋白质设计表达式的程序可在www.bioinfo.rpi.edu/tools/piecemeal.html获取。联系方式:bystrc@rpi.edu。补充信息:补充数据可在《生物信息学》在线获取。