Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Breisacher Straße 115a, D-79106 Freiburg, Germany.
Faculty of Biology, Albert-Ludwigs-University Freiburg, Schänzlestraße 1, D-79104 Freiburg, Freiburg, Germany.
Gigascience. 2022 Feb 15;11. doi: 10.1093/gigascience/giac005.
Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility.
To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community.
The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.
数据非依赖性采集(DIA)已成为全局、质谱蛋白质组学研究中的重要方法,因为它为深入了解生物系统的分子多样性提供了帮助。然而,由于其高度复杂性、大数据量和样本量,DIA 数据分析仍然具有挑战性,这需要专门的软件和大量的计算基础设施。大多数现有的开源 DIA 软件都需要基本的编程技能,并且仅涵盖完整 DIA 数据分析的一小部分。因此,DIA 数据分析通常需要使用多种软件工具,并且需要确保这些工具之间的兼容性,这严重限制了其可用性和可重复性。
为了克服这一障碍,我们在 Galaxy 框架中集成了一套开源 DIA 工具,用于可重复和版本控制的数据处理。DIA 工具套件包括 OpenSwath、PyProphet、diapysef 和 swath2stats。我们为 DIA 处理编译了功能齐全的 Galaxy 管道,这些管道提供了基于网络的图形用户界面,用于在 Galaxy 框架的免费访问、功能强大的计算资源上使用这些预先安装和预先配置的工具。这种方法还能够在共享原始数据和结果的同时,无缝共享带有完整配置的工作流程。我们通过对一个 Spike-in 案例研究数据集的分析,展示了 Galaxy 中一站式 DIA 管道的可用性。此外,还提供了大量的培训材料,以进一步增加蛋白质组学社区的访问权限。
将开源 DIA 分析套件集成到基于网络且用户友好的 Galaxy 框架中,并结合广泛的培训材料,使广大研究人员能够进行可重复且透明的 DIA 数据分析。