Chard Kyle, Dart Eli, Foster Ian, Shifflett David, Tuecke Steven, Williams Jason
University of Chicago, Chicago, IL, United States of America.
Argonne National Laboratory, Lemont, IL, United States of America.
PeerJ Comput Sci. 2018 Jan 15;4:e144. doi: 10.7717/peerj-cs.144. eCollection 2018.
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.
我们描述了通过研究数据门户提供对大数据的便捷、高速、安全访问的最佳实践。我们将这些最佳实践纳入一种新的设计模式——现代研究数据门户,该模式将传统的基于网络的整体式数据门户进行分解,以实现数据传输性能的数量级提升,支持将控制逻辑与数据存储解耦的新部署架构,并降低开发和运营成本。我们介绍了这种设计模式;解释了它如何利用高性能数据飞地和基于云的数据管理服务;回顾了研究实验室和大学的代表性示例,包括实验设施和超级计算机站点;描述了如何利用Python API进行身份验证、授权、数据传输和数据共享;并使用代码示例演示了如何使用这些API来实现一系列研究数据门户功能。配套网站https://docs.globus.org/mrdp上的示例代码提供了应用程序框架,读者可以对其进行调整以实现自己的研究数据门户。