Krejci Adam, Hupp Ted R, Lexa Matej, Vojtesek Borivoj, Muller Petr
RECAMO, Masaryk Memorial Cancer Institute, Zluty kopec 7, 65653, Brno, Czech Republic.
University of Edinburgh, Institute of Genetics and Molecular Medicine, Cancer Research Centre, Edinburgh EH4 2XR, UK and.
Bioinformatics. 2016 Jan 1;32(1):9-16. doi: 10.1093/bioinformatics/btv522. Epub 2015 Sep 5.
Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions.
The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible.
Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases.
Supplementary data are available at Bioinformatics online.
蛋白质通常基于位于蛋白质表面无序区域的短线性基序来识别其相互作用伙伴。研究此类基序的实验技术使用短肽来模拟相互作用蛋白质的结构特性。这些方法的不断发展使得大规模筛选成为可能,从而产生大量的肽序列,这些序列可能包含多种蛋白质 - 蛋白质相互作用的信息。对于研究蛋白质 - 蛋白质相互作用的大规模研究而言,处理此类数据集是一项复杂但必不可少的任务。
本文介绍的软件工具能够在来自各种来源的海量数据集中快速识别携带共享特异性基序的多个序列簇,并生成已识别簇的多序列比对。该方法应用于先前发表的一个较小的数据集,该数据集包含用于SH3结构域的不同类别的配体,以及一个新的、比其大一个数量级的数据集,该数据集包含几种单克隆抗体的表位。该软件成功识别出模拟抗体靶标表位的序列簇,以及揭示抗体接受与原始表位序列存在一些偏差的二级簇。另一项测试表明,处理甚至更大的数据集在计算上也是可行的。
Hammock根据GNU GPL v. 3许可发布,可作为独立程序免费获取(从http://www.recamo.cz/en/software/hammock-cluster-peptides/),也可作为Galaxy工具箱的工具获取(从https://toolshed.g2.bx.psu.edu/view/hammock/hammock)。源代码可从https://github.com/hammock-dev/hammock/releases下载。
补充数据可在《生物信息学》在线获取。