Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark.
PLoS One. 2011;6(11):e26781. doi: 10.1371/journal.pone.0026781. Epub 2011 Nov 2.
Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.
近年来,高通量技术的进步使得人们能够以前所未有的速度和规模生成基因和蛋白质序列数据,从而为分析复杂的生物过程提供了全新的基于“组学”的方法。然而,即使是单个实验产生的数据量和复杂性也严重挑战了那些具有有限生物信息学专业知识的研究人员,他们需要在数据能够在生物学背景下被理解之前,对其进行处理、分析和解释。因此,人们需要一种允许非生物信息学用户解释大型数据集的工具。
我们最近开发了一种方法,即 NNAlign,它通常适用于任何有定量肽数据的生物学问题。该方法通过同时对齐肽序列并识别与定量读数相关的基序,有效地识别潜在的序列模式。
在这里,我们提供了 NNAlign 的一个基于网络的实现,允许非专家终端用户提交他们的数据(可选地调整方法参数),并返回一个经过训练的方法(包括所识别基序的可视化表示),该方法随后可作为预测方法并应用于未知的蛋白质/肽。
我们已经成功地将这种方法应用于多个不同的数据集,包括包含超过 100000 个数据点的肽微阵列衍生数据集。NNAlign 可在线使用,网址为:http://www.cbs.dtu.dk/services/NNAlign。