Suppr超能文献

派格萨斯:用于执行和整合生物序列分析的软件。

Pegasys: software for executing and integrating analyses of biological sequences.

作者信息

Shah Sohrab P, He David Y M, Sawkins Jessica N, Druce Jeffrey C, Quon Gerald, Lett Drew, Zheng Grace X Y, Xu Tao, Ouellette B F Francis

机构信息

UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada.

出版信息

BMC Bioinformatics. 2004 Apr 19;5:40. doi: 10.1186/1471-2105-5-40.

Abstract

BACKGROUND

We present Pegasys--a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools.

RESULTS

The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries.

CONCLUSIONS

The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.

摘要

背景

我们展示了Pegasys——一个灵活、模块化且可定制的软件系统,它有助于执行来自异构生物序列分析工具的数据集成。

结果

Pegasys系统包括众多用于双序列和多序列比对、从头基因预测、RNA基因检测、在基因组DNA中屏蔽重复序列的工具,以及用于数据库格式化和处理各种分析工具原始输出的过滤器。我们引入了一种用于创建序列分析工作流程的新型数据结构和一个统一的数据模型来存储其结果。该软件允许用户通过操作图形用户界面在运行时动态创建分析工作流程。所有非串行依赖分析都在计算集群上并行执行,以提高数据生成效率。Pegasys的统一数据模型和后端关系数据库管理系统允许将工作流程中包含的异构程序的结果进行集成,并导出为通用特征格式以便在依赖GFF的工具中进行进一步分析,或导出为GAME XML以便导入到Apollo基因组编辑器中。设计的模块化允许在几乎没有程序员开销的情况下将新工具添加到系统中。数据库应用程序编程接口允许通过SQL查询以编程方式访问存储在后端的数据。

结论

Pegasys系统使生物学家和生物信息学家能够创建和管理序列分析工作流程。该软件根据开源GNU通用公共许可证发布。所有源代码和文档可从http://bioinformatics.ubc.ca/pegasys/下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4103/406494/d37acf88ea28/1471-2105-5-40-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验