Jayaraj P B, Ajay Mathias K, Nufail M, Gopakumar G, Jaleel U C A
Department of Computer Science and Engineering, National Institute of Technology Calicut, NITC Campus, Calicut, Kerala 673601 India.
Center for Cheminformatics, Open Source Pharma, No. 22, World Trade Centre, Malleswaram, Bengaluru, Karnataka 560055 India.
J Cheminform. 2016 Mar 1;8:12. doi: 10.1186/s13321-016-0124-8. eCollection 2016.
In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU).
Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment.
Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.
计算机模拟方法是现代药物发现范式的一个组成部分。虚拟筛选作为一种计算机模拟方法,用于优化数据模型并缩小需要进行湿实验室实验的化学空间。对配体数据模型进行虚拟筛选需要大规模计算,这使其成为一项耗时极长的任务。通过在图形处理单元(GPU)上实现并行算法,可以加快这一过程。
随机森林是一种强大的分类算法,可用于虚拟筛选。本文提出并评估了一种在GPU系统上使用随机森林的基于配体的虚拟筛选工具(GPURFSCREEN)。该工具在较低的执行时间内为大型生物测定数据集产生了优化结果。我们的工具在GPU上产生的结果质量与在常规串行环境中相同。
考虑到要筛选的数据量,并行化虚拟筛选在高吞吐量下的运行时间显著更低。所提出的并行工具在训练和预测阶段成功筛选了数十亿个分子,其性能优于串行工具。