The Donnelly Centre, Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada M5S 3E1.
Nucleic Acids Res. 2012 Mar;40(6):e47. doi: 10.1093/nar/gkr1294. Epub 2011 Dec 30.
Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of such methods by several orders of magnitude. These advances have helped reveal the presence of distinct binding specificity classes that co-exist within a set of ligands interacting with the same target. Here, we introduce a software system called MUSI that can rapidly analyze very large data sets of binding sequences to determine the relevant binding specificity patterns. Our pipeline provides two major advances. First, it can detect previously unrecognized multiple specificity patterns in any data set. Second, it offers integrated processing of very large data sets from next-generation sequencing machines. The results are visualized as multiple sequence logos describing the different binding preferences of the protein under investigation. We demonstrate the performance of MUSI by analyzing recent phage display data for human SH3 domains as well as microarray data for mouse transcription factors.
肽识别结构域和转录因子在细胞信号转导中起着至关重要的作用。它们分别与线性氨基酸或核苷酸序列具有高度特异性结合。评估这些结构域结合特异性的实验技术,如微阵列或噬菌体展示,可以获得数千种不同的配体,从而深入了解结合特异性。特别是,新一代测序技术的出现最近使这些方法的通量提高了几个数量级。这些进展有助于揭示在与同一靶标相互作用的一组配体中存在的不同结合特异性类别。在这里,我们引入了一个名为 MUSI 的软件系统,它可以快速分析大量的结合序列数据,以确定相关的结合特异性模式。我们的流水线提供了两个主要的优势。首先,它可以在任何数据集检测到以前未被识别的多个特异性模式。其次,它提供了来自下一代测序仪的非常大数据集的集成处理。结果以描述所研究蛋白质的不同结合偏好的多个序列 logo 呈现。我们通过分析人类 SH3 结构域的噬菌体展示数据以及小鼠转录因子的微阵列数据来演示 MUSI 的性能。