• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用Apache Spark在公共云资源上进行大规模虚拟筛选。

Large-scale virtual screening on public cloud resources with Apache Spark.

作者信息

Capuccini Marco, Ahmed Laeeq, Schaal Wesley, Laure Erwin, Spjuth Ola

机构信息

Department of Information Technology, Uppsala University, Box 337, 75105 Uppsala, Sweden.

Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124 Uppsala, Sweden.

出版信息

J Cheminform. 2017 Mar 6;9:15. doi: 10.1186/s13321-017-0204-4. eCollection 2017.

DOI:10.1186/s13321-017-0204-4
PMID:28316653
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5339264/
Abstract

BACKGROUND

Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark.

RESULTS

We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment.

CONCLUSION

Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.

摘要

背景

基于结构的虚拟筛选是一种针对虚拟分子库筛选目标受体的计算机模拟方法。将基于对接的筛选应用于大型分子库在计算上可能成本高昂,但其构成了一项极易并行化的任务。大多数现有的并行实现基于消息传递接口,依赖低故障率硬件和快速网络连接。谷歌的MapReduce彻底改变了大规模分析,能够在商用硬件和云资源上处理海量数据集,在软件层面提供透明的可扩展性和容错能力。MapReduce的开源实现包括Apache Hadoop和更新的Apache Spark。

结果

我们开发了一种利用MapReduce方法在分布式云资源上运行现有基于对接的筛选软件的方法。我们对我们在Apache Spark中实现的方法进行了基准测试,将一个公开可用的目标受体与220万个化合物进行对接。性能实验表明,在公共云环境中运行时具有良好的并行效率(87%)。

结论

我们的方法能够在公共云资源或商用计算机集群上进行基于结构的并行虚拟筛选。我们实现的可扩展性程度允许先在相对较小的库上试用我们的方法,然后再扩展到更大的库。我们的实现名为Spark-VS,可从GitHub(https://github.com/mcapuccini/spark-vs)作为开源软件免费获取。图形摘要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/499a035a5075/13321_2017_204_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/d3ef8ad92ada/13321_2017_204_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/758a2eba27a8/13321_2017_204_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/b29bc28de83b/13321_2017_204_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/499a035a5075/13321_2017_204_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/d3ef8ad92ada/13321_2017_204_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/758a2eba27a8/13321_2017_204_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/b29bc28de83b/13321_2017_204_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c4/5339264/499a035a5075/13321_2017_204_Fig3_HTML.jpg

相似文献

1
Large-scale virtual screening on public cloud resources with Apache Spark.利用Apache Spark在公共云资源上进行大规模虚拟筛选。
J Cheminform. 2017 Mar 6;9:15. doi: 10.1186/s13321-017-0204-4. eCollection 2017.
2
Efficient iterative virtual screening with Apache Spark and conformal prediction.使用Apache Spark和共形预测进行高效迭代虚拟筛选。
J Cheminform. 2018 Mar 1;10(1):8. doi: 10.1186/s13321-018-0265-z.
3
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.CloudDOE:一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。
PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.
4
VC@Scale: Scalable and high-performance variant calling on cluster environments.VC@Scale:在集群环境中进行可扩展且高性能的变体调用。
Gigascience. 2021 Sep 7;10(9). doi: 10.1093/gigascience/giab057.
5
Big Data in metagenomics: Apache Spark vs MPI.宏基因组学中的大数据:Apache Spark 与 MPI。
PLoS One. 2020 Oct 6;15(10):e0239741. doi: 10.1371/journal.pone.0239741. eCollection 2020.
6
ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.ADS-HCSpark:一种可扩展的基于 Spark 的单倍型调用程序,利用自适应数据分段来加速变异调用。
BMC Bioinformatics. 2019 Feb 14;20(1):76. doi: 10.1186/s12859-019-2665-0.
7
SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.SparkSeq:一种快速、可扩展且适用于云环境的工具,可实现具有核苷酸精度的交互式基因组数据分析。
Bioinformatics. 2014 Sep 15;30(18):2652-3. doi: 10.1093/bioinformatics/btu343. Epub 2014 May 19.
8
HRV-Spark: Computing Heart Rate Variability Measures Using Apache Spark.HRV-Spark:使用Apache Spark计算心率变异性指标
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020;2020. doi: 10.1109/bibm49941.2020.9313361. Epub 2020 Jan 13.
9
Cloud-BS: A MapReduce-based bisulfite sequencing aligner on cloud.Cloud-BS:一种基于MapReduce的云端亚硫酸氢盐测序比对器。
J Bioinform Comput Biol. 2018 Dec;16(6):1840028. doi: 10.1142/S0219720018400280. Epub 2018 Oct 30.
10
PySpark and RDKit: Moving towards Big Data in Cheminformatics.PySpark 和 RDKit:迈向化学生物信息学的大数据时代。
Mol Inform. 2019 Jun;38(6):e1800082. doi: 10.1002/minf.201800082. Epub 2019 Mar 7.

引用本文的文献

1
A Review on Parallel Virtual Screening Softwares for High-Performance Computers.高性能计算机并行虚拟筛选软件综述
Pharmaceuticals (Basel). 2022 Jan 4;15(1):63. doi: 10.3390/ph15010063.
2
Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints.我们应该融入化学吗?无监督迁移学习与主成分分析(PCA)、均匀流形近似与投影(UMAP)以及变分自编码器(VAE)在分子指纹上的比较。
Pharmaceuticals (Basel). 2021 Aug 2;14(8):758. doi: 10.3390/ph14080758.
3
Towards reproducible computational drug discovery.

本文引用的文献

1
SureChEMBL: a large-scale, chemically annotated patent document database.SureChEMBL:一个大规模的、经过化学注释的专利文献数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D1220-8. doi: 10.1093/nar/gkv1253. Epub 2015 Nov 17.
2
ZINC: a free tool to discover chemistry for biology.ZINC:一款用于生物学的免费化学发现工具。
J Chem Inf Model. 2012 Jul 23;52(7):1757-68. doi: 10.1021/ci3001277. Epub 2012 Jun 15.
3
Structure-based virtual screening for drug discovery: a problem-centric review.基于结构的药物发现虚拟筛选:以问题为中心的综述。
迈向可重复的计算药物发现。
J Cheminform. 2020 Jan 28;12(1):9. doi: 10.1186/s13321-020-0408-x.
4
A selective method for optimizing ensemble docking-based experiments on an InhA Fully-Flexible receptor model.一种优化基于集合 docking 的实验的选择性方法,该实验基于 InhA 完全柔性受体模型。
BMC Bioinformatics. 2018 Jun 22;19(1):235. doi: 10.1186/s12859-018-2222-2.
5
Efficient iterative virtual screening with Apache Spark and conformal prediction.使用Apache Spark和共形预测进行高效迭代虚拟筛选。
J Cheminform. 2018 Mar 1;10(1):8. doi: 10.1186/s13321-018-0265-z.
6
Enabling the hypothesis-driven prioritization of ligand candidates in big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control.在大型数据库中支持基于假设的配体候选物优先级排序:Screenlamp 及其在入侵物种控制的 GPCR 抑制剂发现中的应用。
J Comput Aided Mol Des. 2018 Mar;32(3):415-433. doi: 10.1007/s10822-018-0100-7. Epub 2018 Jan 30.
AAPS J. 2012 Mar;14(1):133-41. doi: 10.1208/s12248-012-9322-0. Epub 2012 Jan 27.
4
Multilevel Parallelization of AutoDock 4.2.AutoDock 4.2 的多层次并行化。
J Cheminform. 2011 Apr 28;3(1):12. doi: 10.1186/1758-2946-3-12.
5
Principles of early drug discovery.早期药物发现的原则。
Br J Pharmacol. 2011 Mar;162(6):1239-49. doi: 10.1111/j.1476-5381.2010.01127.x.
6
Structure-based virtual ligand screening: recent success stories.基于结构的虚拟配体筛选:近期成功案例
Comb Chem High Throughput Screen. 2009 Dec;12(10):1000-16. doi: 10.2174/138620709789824682.
7
Essential factors for successful virtual screening.虚拟筛选成功的关键因素。
Mini Rev Med Chem. 2008 Jan;8(1):63-72. doi: 10.2174/138955708783331540.
8
High-throughput screening: update on practices and success.高通量筛选:实践与成果的最新进展
J Biomol Screen. 2006 Oct;11(7):864-9. doi: 10.1177/1087057106292473. Epub 2006 Sep 14.
9
Unexpected binding mode of a cyclic sulfamide HIV-1 protease inhibitor.一种环磺酰胺HIV-1蛋白酶抑制剂的意外结合模式。
J Med Chem. 1997 Mar 14;40(6):898-902. doi: 10.1021/jm960588d.