Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA.
BMC Bioinformatics. 2022 May 28;23(1):197. doi: 10.1186/s12859-022-04727-6.
Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods.
To address the above mentioned barriers, we have implemented the peptide design and analysis under Galaxy (PDAUG) package, a Galaxy-based Python powered collection of tools, workflows, and datasets for rapid in-silico peptide library analysis. In contrast to existing methods like standard programming libraries or rigid single-function web-based tools, PDAUG offers an integrated GUI-based toolset, providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Finally, we demonstrate the usability of PDAUG in predicting anticancer properties of peptides using four different feature sets and assess the suitability of various ML algorithms.
PDAUG offers tools for peptide library generation, data visualization, built-in and public database peptide sequence retrieval, peptide feature calculation, and machine learning (ML) modeling. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies.
基于初始筛选和预测具有所需功能的肽的计算方法已被证明是替代传统肽研究中冗长且昂贵的生化实验方法的有效方法,从而节省了时间和精力。然而,对于许多研究人员来说,缺乏利用编程库、计算资源和灵活的管道的专业知识是采用这些先进方法的主要障碍。
为了解决上述问题,我们实现了 Galaxy 下的肽设计和分析 (PDAUG) 包,这是一个基于 Galaxy 的 Python 驱动的工具、工作流程和数据集集合,用于快速进行计算机肽文库分析。与现有的方法(如标准编程库或刚性单功能基于网络的工具)不同,PDAUG 提供了一个集成的基于 GUI 的工具集,无需编程专业知识即可提供构建和分发可重复使用的管道和工作流程的灵活性。最后,我们使用四个不同的特征集来展示 PDAUG 在预测肽的抗癌特性方面的可用性,并评估各种 ML 算法的适用性。
PDAUG 提供了用于肽文库生成、数据可视化、内置和公共数据库肽序列检索、肽特征计算和机器学习 (ML) 建模的工具。此外,该工具集还便于研究人员将 PDAUG 与数百个兼容的现有 Galaxy 工具结合使用,以实现无限的分析策略。