MUSI：一种用于从非常大的肽或核酸数据集识别多种特异性的集成系统。

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

机构信息

The Donnelly Centre, Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada M5S 3E1.

出版信息

Nucleic Acids Res. 2012 Mar;40(6):e47. doi: 10.1093/nar/gkr1294. Epub 2011 Dec 30.

DOI:10.1093/nar/gkr1294

PMID:22210894

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3315295/

Abstract

Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of such methods by several orders of magnitude. These advances have helped reveal the presence of distinct binding specificity classes that co-exist within a set of ligands interacting with the same target. Here, we introduce a software system called MUSI that can rapidly analyze very large data sets of binding sequences to determine the relevant binding specificity patterns. Our pipeline provides two major advances. First, it can detect previously unrecognized multiple specificity patterns in any data set. Second, it offers integrated processing of very large data sets from next-generation sequencing machines. The results are visualized as multiple sequence logos describing the different binding preferences of the protein under investigation. We demonstrate the performance of MUSI by analyzing recent phage display data for human SH3 domains as well as microarray data for mouse transcription factors.

摘要

肽识别结构域和转录因子在细胞信号转导中起着至关重要的作用。它们分别与线性氨基酸或核苷酸序列具有高度特异性结合。评估这些结构域结合特异性的实验技术，如微阵列或噬菌体展示，可以获得数千种不同的配体，从而深入了解结合特异性。特别是，新一代测序技术的出现最近使这些方法的通量提高了几个数量级。这些进展有助于揭示在与同一靶标相互作用的一组配体中存在的不同结合特异性类别。在这里，我们引入了一个名为 MUSI 的软件系统，它可以快速分析大量的结合序列数据，以确定相关的结合特异性模式。我们的流水线提供了两个主要的优势。首先，它可以在任何数据集检测到以前未被识别的多个特异性模式。其次，它提供了来自下一代测序仪的非常大数据集的集成处理。结果以描述所研究蛋白质的不同结合偏好的多个序列 logo 呈现。我们通过分析人类 SH3 结构域的噬菌体展示数据以及小鼠转录因子的微阵列数据来演示 MUSI 的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26de/3315295/93095b7ee7be/gkr1294f1.jpg

相似文献

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.MUSI：一种用于从非常大的肽或核酸数据集识别多种特异性的集成系统。

Nucleic Acids Res. 2012 Mar;40(6):e47. doi: 10.1093/nar/gkr1294. Epub 2011 Dec 30.

Phage display selection of ligand residues important for Src homology 3 domain binding specificity.噬菌体展示筛选对Src同源3结构域结合特异性重要的配体残基。

Proc Natl Acad Sci U S A. 1995 Nov 21;92(24):10909-13. doi: 10.1073/pnas.92.24.10909.

Exhaustive search of linear information encoding protein-peptide recognition.对编码蛋白质-肽识别的线性信息进行穷举搜索。

PLoS Comput Biol. 2017 Apr 20;13(4):e1005499. doi: 10.1371/journal.pcbi.1005499. eCollection 2017 Apr.

Comprehensive Analysis of the Human SH3 Domain Family Reveals a Wide Variety of Non-canonical Specificities.人类SH3结构域家族的综合分析揭示了多种非典型特异性。

Structure. 2017 Oct 3;25(10):1598-1610.e3. doi: 10.1016/j.str.2017.07.017. Epub 2017 Sep 7.

A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains.一种无比对的域-肽相互作用预测图核方法及其在人类 SH3 结构域中的应用。

Bioinformatics. 2013 Jul 1;29(13):i335-43. doi: 10.1093/bioinformatics/btt220.

Why ligand cross-reactivity is high within peptide recognition domain families? A case study on human c-Src SH3 domain.为何配体交叉反应性在肽识别结构域家族内较高？以人 c-Src SH3 结构域为例。

J Theor Biol. 2014 Jan 7;340:30-7. doi: 10.1016/j.jtbi.2013.08.026. Epub 2013 Sep 8.

Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach.使用 Gibbs 采样方法同时对齐和聚类肽数据。

Bioinformatics. 2013 Jan 1;29(1):8-14. doi: 10.1093/bioinformatics/bts621. Epub 2012 Oct 24.

Discovery of short linear motif-mediated interactions through phage display of intrinsically disordered regions of the human proteome.通过人蛋白质组内在无序区域的噬菌体展示发现短线性基序介导的相互作用。

FEBS J. 2017 Feb;284(3):485-498. doi: 10.1111/febs.13995. Epub 2017 Jan 18.

Improving SH3 domain ligand selectivity using a non-natural scaffold.利用非天然支架提高SH3结构域配体选择性

Chem Biol. 2000 Jul;7(7):463-73. doi: 10.1016/s1074-5521(00)00130-7.

Characterizing SH2 Domain Specificity and Network Interactions Using SPOT Peptide Arrays.使用SPOT肽阵列表征SH2结构域特异性和网络相互作用。

Methods Mol Biol. 2017;1555:357-373. doi: 10.1007/978-1-4939-6762-9_20.

引用本文的文献

FaSTPACE: a fast and scalable tool for peptide alignment and consensus extraction.FaSTPACE：一种用于肽段比对和共有序列提取的快速且可扩展的工具。

NAR Genom Bioinform. 2024 Aug 21;6(3):lqae103. doi: 10.1093/nargab/lqae103. eCollection 2024 Sep.

A universal deep-learning model for zinc finger design enables transcription factor reprogramming.一种通用的深度学习模型可用于锌指设计，从而实现转录因子的重新编程。

Nat Biotechnol. 2023 Aug;41(8):1117-1129. doi: 10.1038/s41587-022-01624-4. Epub 2023 Jan 26.

Motifier: An IgOme Profiler Based on Peptide Motifs Using Machine Learning.Motifier：一种基于肽基序的 IgOme 分析器，采用机器学习方法。

J Mol Biol. 2021 Jul 23;433(15):167071. doi: 10.1016/j.jmb.2021.167071. Epub 2021 May 28.

Learning peptide recognition rules for a low-specificity protein.学习低特异性蛋白质的肽识别规则。

Protein Sci. 2020 Nov;29(11):2259-2273. doi: 10.1002/pro.3958. Epub 2020 Oct 5.

A general approach for predicting protein epitopes targeted by antibody repertoires using whole proteomes.使用全蛋白质组预测抗体库靶向的蛋白质表位的一般方法。

PLoS One. 2019 Sep 6;14(9):e0217668. doi: 10.1371/journal.pone.0217668. eCollection 2019.

SAROTUP: a suite of tools for finding potential target-unrelated peptides from phage display data.SAROTUP：一套从噬菌体展示数据中寻找潜在非靶相关肽的工具。

Int J Biol Sci. 2019 Jun 2;15(7):1452-1459. doi: 10.7150/ijbs.31957. eCollection 2019.

Predicting Antigen Presentation-What Could We Learn From a Million Peptides?预测抗原呈递——我们能从一百万个肽中学到什么？

Front Immunol. 2018 Jul 25;9:1716. doi: 10.3389/fimmu.2018.01716. eCollection 2018.

PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants.PSSMSearch：一个用于建模、可视化、蛋白质基序特异性决定因素的全蛋白质组发现和注释的服务器。

Nucleic Acids Res. 2018 Jul 2;46(W1):W235-W241. doi: 10.1093/nar/gky426.

Biopanning data bank 2018: hugging next generation phage display.2018 年生物淘选数据库：拥抱下一代噬菌体展示技术。

Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay032.

Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.解析HLA肽组中的HLA-I基序可改善新抗原预测并识别调节HLA特异性的变构现象。

PLoS Comput Biol. 2017 Aug 23;13(8):e1005725. doi: 10.1371/journal.pcbi.1005725. eCollection 2017 Aug.

本文引用的文献

The multiple-specificity landscape of modular peptide recognition domains.模块化肽识别结构域的多重特异性景观。

Mol Syst Biol. 2011 Apr 26;7:484. doi: 10.1038/msb.2011.18.

Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing.通过高通量噬菌体展示和深度测序分析PDZ结构域-配体相互作用的协同进化

Mol Biosyst. 2010 Oct;6(10):1782-90. doi: 10.1039/c0mb00061b. Epub 2010 Aug 11.

High-resolution mapping of protein sequence-function relationships.高分辨率蛋白质序列-功能关系图谱绘制。

Nat Methods. 2010 Sep;7(9):741-6. doi: 10.1038/nmeth.1492. Epub 2010 Aug 15.

Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo.全基因组范围内分析 ETS 家族在体外和体内的 DNA 结合情况。

EMBO J. 2010 Jul 7;29(13):2147-60. doi: 10.1038/emboj.2010.106. Epub 2010 Jun 1.

MOTIPS: automated motif analysis for predicting targets of modular protein domains.MOTIPS：用于预测模块化蛋白质结构域靶标的自动化基序分析。

BMC Bioinformatics. 2010 May 11;11:243. doi: 10.1186/1471-2105-11-243.

Parallelization of the MAFFT multiple sequence alignment program.MAFFT 多序列比对程序的并行化。

Bioinformatics. 2010 Aug 1;26(15):1899-900. doi: 10.1093/bioinformatics/btq224. Epub 2010 Apr 28.

Specificity landscapes of DNA binding molecules elucidate biological function.DNA 结合分子的特异性景观阐明了生物学功能。

Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4544-9. doi: 10.1073/pnas.0914023107. Epub 2010 Feb 22.

Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis.定量磷酸化蛋白质组学揭示了有丝分裂过程中广泛的全磷酸化位点占据。

Sci Signal. 2010 Jan 12;3(104):ra3. doi: 10.1126/scisignal.2000475.

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.Sanger 测序的 FASTQ 文件格式，用于包含质量分数的序列，以及 Solexa/Illumina FASTQ 变体。

Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.

Inferring binding energies from selected binding sites.从选定的结合位点推断结合能。

PLoS Comput Biol. 2009 Dec;5(12):e1000590. doi: 10.1371/journal.pcbi.1000590. Epub 2009 Dec 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MUSI：一种用于从非常大的肽或核酸数据集识别多种特异性的集成系统。

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献