Jalili Vahid, Afgan Enis, Taylor James, Goecks Jeremy
Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, USA.
Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
Bioinformatics. 2020 Jan 1;36(1):1-9. doi: 10.1093/bioinformatics/btz472.
Large biomedical datasets, such as those from genomics and imaging, are increasingly being stored on commercial and institutional cloud computing platforms. This is because cloud-scale computing resources, from robust backup to high-speed data transfer to scalable compute and storage, are needed to make these large datasets usable. However, one challenge for large-scale biomedical data on the cloud is providing secure access, especially when datasets are distributed across platforms. While there are open Web protocols for secure authentication and authorization, these protocols are not in wide use in bioinformatics and are difficult to use for even technologically sophisticated users.
We have developed a generic and extensible approach for securely accessing biomedical datasets distributed across cloud computing platforms. Our approach combines OpenID Connect and OAuth2, best-practice Web protocols for authentication and authorization, together with Galaxy (https://galaxyproject.org), a web-based computational workbench used by thousands of scientists across the world. With our enhanced version of Galaxy, users can access and analyze data distributed across multiple cloud computing providers without any special knowledge of access/authorization protocols. Our approach does not require users to share permanent credentials (e.g. username, password, API key), instead relying on automatically generated temporary tokens that refresh as needed. Our approach is generalizable to most identity providers and cloud computing platforms. To the best of our knowledge, Galaxy is the only computational workbench where users can access biomedical datasets across multiple cloud computing platforms using best-practice Web security approaches and thereby minimize risks of unauthorized data access and credential use.
Freely available for academic and commercial use under the open-source Academic Free License (https://opensource.org/licenses/AFL-3.0) from the following Github repositories: https://github.com/galaxyproject/galaxy and https://github.com/galaxyproject/cloudauthz.
大型生物医学数据集,如来自基因组学和成像的数据,越来越多地存储在商业和机构云计算平台上。这是因为要使这些大型数据集可用,需要云规模的计算资源,从强大的备份到高速数据传输,再到可扩展的计算和存储。然而,云计算上大规模生物医学数据面临的一个挑战是提供安全访问,尤其是当数据集分布在多个平台上时。虽然有用于安全认证和授权的开放网络协议,但这些协议在生物信息学中并未得到广泛应用,即使对于技术熟练的用户来说也难以使用。
我们开发了一种通用且可扩展的方法,用于安全访问分布在云计算平台上的生物医学数据集。我们的方法将OpenID Connect和OAuth2(用于认证和授权的最佳实践网络协议)与Galaxy(https://galaxyproject.org)相结合,Galaxy是一个基于网络的计算工作台,全球数千名科学家都在使用。通过我们增强版的Galaxy,用户无需对访问/授权协议有任何专门知识,就能访问和分析分布在多个云计算提供商的数据。我们的方法不要求用户共享永久凭证(如用户名、密码、API密钥),而是依赖于根据需要自动生成并刷新的临时令牌。我们的方法可推广到大多数身份提供商和云计算平台。据我们所知,Galaxy是唯一一款计算工作台,用户可以使用最佳实践的网络安全方法跨多个云计算平台访问生物医学数据集,从而将未经授权的数据访问和凭证使用风险降至最低。