Sainsbury Laboratory, University of Cambridge, Cambridge, UK.
Bioinformatics. 2018 Jul 1;34(13):2295-2296. doi: 10.1093/bioinformatics/bty088.
The secretome denotes the collection of secreted proteins exported outside of the cell. The functional roles of secreted proteins include the maintenance and remodelling of the extracellular matrix as well as signalling between host and non-host cells. These features make secretomes rich reservoirs of biomarkers for disease classification and host-pathogen interaction studies. Common biomarkers are extracellular proteins secreted via classical pathways that can be predicted from sequence by annotating the presence or absence of N-terminal signal peptides. Several heterogeneous command line tools and web-interfaces exist to identify individual motifs, signal sequences and domains that are either characteristic or strictly excluded from secreted proteins. However, a single flexible secretome-prediction workflow that combines all analytic steps is still missing.
To bridge this gap the SecretSanta package implements wrapper and parser functions around established command line tools for the integrative prediction of extracellular proteins that are secreted via classical pathways. The modularity of SecretSanta enables users to create tailored pipelines and apply them across the whole tree of life to facilitate comparison of secretomes across multiple species or under various conditions.
SecretSanta is implemented in the R programming language and is released under GPL-3 license. All functions have been optimized and parallelized to allow large-scale processing of sequences. The open-source code, installation instructions and vignette with use case scenarios can be downloaded from https://github.com/gogleva/SecretSanta.
Supplementary data are available at Bioinformatics online.
分泌组是指细胞外分泌的蛋白质的集合。分泌蛋白的功能作用包括维持和重塑细胞外基质以及宿主细胞和非宿主细胞之间的信号传递。这些特征使分泌组成为疾病分类和宿主-病原体相互作用研究的生物标志物的丰富来源。常见的生物标志物是通过经典途径分泌的细胞外蛋白质,可以通过注释 N 端信号肽的存在或缺失来从序列中预测。有几种异构的命令行工具和 Web 界面可用于识别特征或严格排除分泌蛋白的单个基序、信号序列和结构域。然而,仍然缺少一个可以组合所有分析步骤的单一灵活的分泌组预测工作流程。
为了弥补这一差距,SecretSanta 包实现了围绕用于通过经典途径预测细胞外蛋白质的既定命令行工具的包装器和解析器功能。SecretSanta 的模块化使用户能够创建定制的管道,并将其应用于整个生命树,以促进在多个物种或各种条件下对分泌组进行比较。
SecretSanta 是用 R 编程语言实现的,并根据 GPL-3 许可证发布。所有功能都经过了优化和并行化处理,以允许对大量序列进行处理。可从 https://github.com/gogleva/SecretSanta 下载开源代码、安装说明和带有用例场景的简介。
补充数据可在《生物信息学》在线获得。