Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, 1650 San Martín, Argentina.
Department of Bio and Health Informatics, Technical University of Denmark, DK-2800 Lyngby, Denmark.
Nucleic Acids Res. 2017 Jul 3;45(W1):W458-W463. doi: 10.1093/nar/gkx248.
Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0.
受体与短线性肽片段(配体)的相互作用是许多生物信号转导过程的基础。保守且信息丰富的氨基酸模式,通常称为序列基序,决定并调节这些相互作用。由于受体-配体系统的特性或用于检测它的检测方法,实验数据通常包含多个序列基序。GibbsCluster 是一种强大的无监督基序发现工具,因为它可以同时对肽数据进行聚类和对齐。这里介绍的 GibbsCluster 2.0 是一个改进的版本,它考虑了插入和缺失,以适应输入肽中基序长度的变化。简而言之,该程序将一组肽序列作为输入,并将它们聚类为有意义的组。它返回识别的最佳聚类数量,以及每个聚类的序列比对和序列基序。有几个参数可用于自定义聚类分析,包括对小聚类和重叠组的可调节惩罚以及用于去除异常值的垃圾聚类。作为一个示例应用,我们使用该服务器从质谱生成的大规模肽组学数据中推断出多种特异性。该服务器可在 http://www.cbs.dtu.dk/services/GibbsCluster-2.0 获得。