• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DCMS:一种用于分子模拟的数据分析与管理系统。

DCMS: A data analytics and management system for molecular simulation.

作者信息

Kumar Anand, Grupcev Vladimir, Berrada Meryem, Fogarty Joseph C, Tu Yi-Cheng, Zhu Xingquan, Pandit Sagar A, Xia Yuni

机构信息

Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Ave., ENB118, Tampa, 33620 Florida USA.

Department of Physics, University of South Florida, 4202 E. Fowler Ave., PHY114, Tampa, 33620 Florida USA.

出版信息

J Big Data. 2015;2(1):9. doi: 10.1186/s40537-014-0009-5. Epub 2014 Nov 26.

DOI:10.1186/s40537-014-0009-5
PMID:26069879
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4456345/
Abstract

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (, SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.

摘要

分子模拟(MS)是研究大型系统物理/化学特征的强大工具,已在许多科学和工程领域得到应用。在模拟过程中,实验会生成大量原子,并旨在观察它们的时空关系以进行科学分析。庞大的数据量及其密集的相互作用给数据访问、管理和分析带来了重大挑战。迄今为止,现有的MS软件系统在MS数据的存储和处理方面存在不足,主要是因为缺少一个支持涉及密集数据访问和分析过程的应用程序的平台。在本文中,我们介绍了我们团队在过去几年中开发的以数据库为中心的分子模拟(DCMS)系统。DCMS背后的主要思想是将MS数据存储在关系数据库管理系统(DBMS)中,以利用现代DBMS的声明式查询接口(如SQL)、数据访问方法、查询处理和优化机制。一个独特的挑战是处理通常计算密集型的分析查询。为此,我们开发了新颖的索引和查询处理策略(包括在现代协处理器上运行的算法)作为DBMS的集成组件。结果,研究人员可以使用DBMS内部实现的高效功能上传和分析他们的数据。生成索引结构来存储其他用户可能感兴趣的分析结果,这样就无需重复分析即可随时获得结果。我们基于PostgreSQL系统开发了DCMS的原型,使用真实MS数据和工作负载进行的实验表明,DCMS明显优于现有的MS软件系统。我们还将其用作测试其他数据管理问题(如安全性和压缩)的平台。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/58b8fff53fb5/40537_2014_9_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/623ebb6ceffc/40537_2014_9_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/6828aaf019f4/40537_2014_9_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/bf7f1d02b6d9/40537_2014_9_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/67d4f49a2798/40537_2014_9_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/e0a2d108b45c/40537_2014_9_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/6fa452be8bdc/40537_2014_9_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/b165a920d9db/40537_2014_9_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/75090a46120d/40537_2014_9_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/58b8fff53fb5/40537_2014_9_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/623ebb6ceffc/40537_2014_9_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/6828aaf019f4/40537_2014_9_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/bf7f1d02b6d9/40537_2014_9_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/67d4f49a2798/40537_2014_9_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/e0a2d108b45c/40537_2014_9_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/6fa452be8bdc/40537_2014_9_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/b165a920d9db/40537_2014_9_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/75090a46120d/40537_2014_9_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d62/4456345/58b8fff53fb5/40537_2014_9_Fig9_HTML.jpg

相似文献

1
DCMS: A data analytics and management system for molecular simulation.DCMS:一种用于分子模拟的数据分析与管理系统。
J Big Data. 2015;2(1):9. doi: 10.1186/s40537-014-0009-5. Epub 2014 Nov 26.
2
A high-performance spatial database based approach for pathology imaging algorithm evaluation.一种基于高性能空间数据库的病理学成像算法评估方法。
J Pathol Inform. 2013 Mar 14;4:5. doi: 10.4103/2153-3539.108543. Print 2013.
3
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.Hadoop-GIS:一种基于MapReduce的高性能空间数据仓库系统。
Proceedings VLDB Endowment. 2013 Aug;6(11).
4
Multi-Query Optimization Revisited: A Full-Query Algebraic Method.重新审视多查询优化:一种全查询代数方法。
Proc IEEE Int Conf Big Data. 2022 Dec;2022:252-261. doi: 10.1109/bigdata55660.2022.10020338. Epub 2023 Jan 26.
5
Data in the time of COVID-19: a general methodology to select and secure a NoSQL DBMS for medical data.新冠疫情时期的数据:一种为医疗数据选择并保障非关系型数据库管理系统安全的通用方法。
PeerJ Comput Sci. 2020 Sep 10;6:e297. doi: 10.7717/peerj-cs.297. eCollection 2020.
6
Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees.具有精度保证的空间距离直方图计算近似算法。
IEEE Trans Knowl Data Eng. 2012 Sep 1;25(9):1982-1996. doi: 10.1109/TKDE.2012.149.
7
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
8
Distance Histogram Computation Based on Spatiotemporal Uniformity in Scientific Data.基于科学数据时空均匀性的距离直方图计算
Adv Database Technol. 2012. doi: 10.1145/2247596.2247631.
9
Design and development of an ethnically-diverse imaging informatics-based eFolder system for multiple sclerosis patients.为多发性硬化症患者设计和开发基于影像信息学的多民族电子文件夹系统。
Comput Med Imaging Graph. 2015 Dec;46 Pt 2(Pt 2):257-68. doi: 10.1016/j.compmedimag.2015.09.007. Epub 2015 Oct 23.
10
Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection.在大规模分子动力学模拟数据库中实现 3D 空间索引和压缩,以实现快速原子接触检测。
BMC Bioinformatics. 2011 Aug 10;12:334. doi: 10.1186/1471-2105-12-334.

引用本文的文献

1
Challenges and opportunities in connecting simulations with experiments via molecular dynamics of cellular environments.通过细胞环境分子动力学将模拟与实验相连接中的挑战与机遇
J Phys Conf Ser. 2018;1036. doi: 10.1088/1742-6596/1036/1/012010. Epub 2018 Jun 27.
2
Optimizing SIEM Throughput on the Cloud Using Parallelization.通过并行化优化云端安全信息和事件管理(SIEM)的吞吐量
PLoS One. 2016 Nov 16;11(11):e0162746. doi: 10.1371/journal.pone.0162746. eCollection 2016.

本文引用的文献

1
Compression in Molecular Simulation Datasets.分子模拟数据集中的压缩
Intell Sci Big Data Eng (2013). 2013 Jul-Aug;8261:22-29. doi: 10.1007/978-3-642-42057-3_4.
2
GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.GROMACS 4:高效、负载均衡和可扩展的分子模拟算法。
J Chem Theory Comput. 2008 Mar;4(3):435-47. doi: 10.1021/ct700301q.
3
Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly.动态计算大型科学数据集的空间距离直方图
IEEE Trans Knowl Data Eng. 2014 Oct;26(10):2410-2424. doi: 10.1109/TKDE.2014.2298015.
4
Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees.具有精度保证的空间距离直方图计算近似算法。
IEEE Trans Knowl Data Eng. 2012 Sep 1;25(9):1982-1996. doi: 10.1109/TKDE.2012.149.
5
Sociology of science: Big data deserve a bigger audience.科学社会学:大数据应拥有更多受众。
Nature. 2012 Feb 15;482(7385):308. doi: 10.1038/482308d.
6
Performance analysis of a dual-tree algorithm for computing spatial distance histograms.用于计算空间距离直方图的双树算法性能分析
VLDB J. 2011 Aug 1;20(4):471-494. doi: 10.1007/s00778-010-0205-7.
7
MDAnalysis: a toolkit for the analysis of molecular dynamics simulations.MDAnalysis:一个用于分析分子动力学模拟的工具包。
J Comput Chem. 2011 Jul 30;32(10):2319-27. doi: 10.1002/jcc.21787. Epub 2011 Apr 15.
8
The spread of behavior in an online social network experiment.在线社交网络实验中的行为传播。
Science. 2010 Sep 3;329(5996):1194-7. doi: 10.1126/science.1185231.
9
Dynameomics: a comprehensive database of protein dynamics.动态蛋白质组学数据库:蛋白质动力学的综合数据库。
Structure. 2010 Mar 14;18(4):423-35. doi: 10.1016/j.str.2010.01.012.
10
Big data: The future of biocuration.大数据:生物编目的未来。
Nature. 2008 Sep 4;455(7209):47-50. doi: 10.1038/455047a.