Department of Chemical Engineering, University of Washington, Seattle, Washington, USA.
J Chem Inf Model. 2013 Feb 25;53(2):493-9. doi: 10.1021/ci300484q. Epub 2013 Feb 4.
Peptide libraries allow researchers to quickly find hundreds of peptide sequences with a desired property. Currently, the large amount of data generated from peptide libraries is analyzed by hand, where researchers search for repeating patterns in the peptide sequences. Such patterns are called motifs. In this work, we describe a set of algorithms which allow quick, efficient, and standard analysis of peptide libraries. Four main techniques are described: (1) choice of the number of motifs present in a peptide library; (2) separation of the peptides into groups of similar sequences; (3) fitting of a model to the peptides to extract motifs; (4) analysis of the library using quantitative structure-property relationships if no clear motifs are present. The application of five previously published data sets shows these techniques can automatically repeat the work of experts quickly and allow much more flexibility in analysis. A new way of visually presenting peptide libraries is also described, which allows visual inspection of the grouping and spread of sequences. The algorithms have been implemented in an open-source plug-in called "peplib" and an online web application.
肽库使研究人员能够快速找到数百种具有所需特性的肽序列。目前,肽库生成的大量数据是通过手动分析的,研究人员在肽序列中寻找重复模式。这种模式被称为基序。在这项工作中,我们描述了一组允许快速、高效和标准地分析肽库的算法。描述了四种主要技术:(1)选择肽库中存在的基序数量;(2)将肽分成相似序列的组;(3)拟合模型以提取基序;(4)如果没有明显的基序,则使用定量构效关系分析库。对五个以前发布的数据集的应用表明,这些技术可以快速重复专家的工作,并允许在分析中具有更大的灵活性。还描述了一种新的肽库可视化呈现方式,它允许对序列的分组和分布进行直观检查。该算法已在一个名为“peplib”的开源插件和在线网络应用程序中实现。