Yim Wen-Wai, Chien Shu, Kusumoto Yasuyuki, Date Susumu, Haga Jason
Dept. of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
Stud Health Technol Inform. 2010;159:181-90.
Large-scale in-silico screening is a necessary part of drug discovery and Grid computing is one answer to this demand. A disadvantage of using Grid computing is the heterogeneous computational environments characteristic of a Grid. In our study, we have found that for the molecular docking simulation program DOCK, different clusters within a Grid organization can yield inconsistent results. Because DOCK in-silico virtual screening (VS) is currently used to help select chemical compounds to test with in-vitro experiments, such differences have little effect on the validity of using virtual screening before subsequent steps in the drug discovery process. However, it is difficult to predict whether the accumulation of these discrepancies over sequentially repeated VS experiments will significantly alter the results if VS is used as the primary means for identifying potential drugs. Moreover, such discrepancies may be unacceptable for other applications requiring more stringent thresholds. This highlights the need for establishing a more complete solution to provide the best scientific accuracy when executing an application across Grids. One possible solution to platform heterogeneity in DOCK performance explored in our study involved the use of virtual machines as a layer of abstraction. This study investigated the feasibility and practicality of using virtual machine and recent cloud computing technologies in a biological research application. We examined the differences and variations of DOCK VS variables, across a Grid environment composed of different clusters, with and without virtualization. The uniform computer environment provided by virtual machines eliminated inconsistent DOCK VS results caused by heterogeneous clusters, however, the execution time for the DOCK VS increased. In our particular experiments, overhead costs were found to be an average of 41% and 2% in execution time for two different clusters, while the actual magnitudes of the execution time costs were minimal. Despite the increase in overhead, virtual clusters are an ideal solution for Grid heterogeneity. With greater development of virtual cluster technology in Grid environments, the problem of platform heterogeneity may be eliminated through virtualization, allowing greater usage of VS, and will benefit all Grid applications in general.
大规模的计算机模拟筛选是药物研发的必要环节,而网格计算是满足这一需求的一种解决方案。使用网格计算的一个缺点是网格具有异构计算环境的特性。在我们的研究中,我们发现对于分子对接模拟程序DOCK,网格组织内的不同集群可能会产生不一致的结果。由于目前DOCK计算机模拟虚拟筛选(VS)用于帮助选择化合物进行体外实验,因此在药物研发过程的后续步骤之前,这种差异对使用虚拟筛选的有效性影响不大。然而,如果将VS用作识别潜在药物的主要手段,很难预测在连续重复的VS实验中这些差异的累积是否会显著改变结果。此外,对于其他需要更严格阈值的应用,这种差异可能是不可接受的。这凸显了在跨网格执行应用程序时建立更完整解决方案以提供最佳科学准确性的必要性。我们研究中探索的解决DOCK性能平台异构性的一种可能方案是使用虚拟机作为一层抽象。本研究调查了在生物研究应用中使用虚拟机和最新云计算技术的可行性和实用性。我们研究了在有和没有虚拟化的情况下,由不同集群组成的网格环境中DOCK VS变量的差异和变化。虚拟机提供的统一计算机环境消除了由异构集群导致的DOCK VS结果不一致的问题,然而,DOCK VS的执行时间增加了。在我们的特定实验中,发现两个不同集群的执行时间平均额外开销成本分别为41%和2%,而实际执行时间成本的幅度很小。尽管额外开销增加,但虚拟集群是解决网格异构性的理想方案。随着网格环境中虚拟集群技术的进一步发展,通过虚拟化可能消除平台异构性问题,从而允许更广泛地使用VS,并将总体上惠及所有网格应用。