Oh Sehyun, Gravel-Pucillo Kai, Ramos Marcel, Schatz Michael C, Davis Sean, Carey Vincent, Morgan Martin, Waldron Levi
Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USA.
Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, New York, USA.
F1000Res. 2024 Oct 21;13:1257. doi: 10.12688/f1000research.155449.1. eCollection 2024.
Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics data and analysis tools. However, utilizing the full capabilities of AnVIL can be challenging for researchers without extensive bioinformatics expertise, especially for executing complex workflows. We present the R package, which enables the convenient execution of bioinformatics workflows hosted on AnVIL directly from an R environment. simplifies the setup of the cloud computing environment, input data formatting, workflow submission, and retrieval of results through intuitive functions. We demonstrate the utility of for three use cases: bulk RNA-seq analysis with , metagenomics analysis with , and digital pathology image processing with The key features of include user-friendly browsing of available data and workflows, seamless integration of R and non-R tools within a reproducible analysis pipeline, and accessibility to scalable computing resources without direct management overhead. lowers the barrier to utilizing AnVIL's resources, especially for exploratory analyses or bulk processing with established workflows. This empowers a broader community of researchers to leverage the latest genomics tools and datasets using familiar R syntax. This package is distributed through the Bioconductor project ( https://bioconductor.org/packages/AnVILWorkflow), and the source code is available through GitHub ( https://github.com/shbrief/AnVILWorkflow).
测序技术的进步和新数据收集方法的发展产生了大量生物数据。基因组数据科学分析、可视化和信息学实验室空间(AnVIL)提供了一个基于云的平台,以实现对大规模基因组数据和分析工具的平等访问。然而,对于没有广泛生物信息学专业知识的研究人员来说,充分利用AnVIL的全部功能可能具有挑战性,尤其是在执行复杂工作流程时。我们展示了一个R包,它能够直接从R环境方便地执行托管在AnVIL上的生物信息学工作流程。该包通过直观的函数简化了云计算环境的设置、输入数据格式化、工作流程提交以及结果检索。我们展示了该包在三个用例中的实用性:使用[具体工具1]进行批量RNA测序分析、使用[具体工具2]进行宏基因组学分析以及使用[具体工具3]进行数字病理学图像处理。该包的关键特性包括对可用数据和工作流程的用户友好型浏览、在可重复分析管道中R工具和非R工具的无缝集成,以及无需直接管理开销即可访问可扩展计算资源。该包降低了利用AnVIL资源的障碍,特别是对于探索性分析或使用既定工作流程进行批量处理而言。这使更广泛的研究人员群体能够使用熟悉的R语法利用最新的基因组学工具和数据集。这个包通过Bioconductor项目(https://bioconductor.org/packages/AnVILWorkflow)进行分发,其源代码可通过GitHub(https://github.com/shbrief/AnVILWorkflow)获取。