Suppr超能文献

用于蛋白质序列中复杂全面模式匹配的3of5网络应用程序。

The 3of5 web application for complex and comprehensive pattern matching in protein sequences.

作者信息

Seiler Markus, Mehrle Alexander, Poustka Annemarie, Wiemann Stefan

机构信息

Division of Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany.

出版信息

BMC Bioinformatics. 2006 Mar 16;7:144. doi: 10.1186/1471-2105-7-144.

Abstract

BACKGROUND

The identification of patterns in biological sequences is a key challenge in genome analysis and in proteomics. Frequently such patterns are complex and highly variable, especially in protein sequences. They are frequently described using terms of regular expressions (RegEx) because of the user-friendly terminology. Limitations arise for queries with the increasing complexity of patterns and are accompanied by requirements for enhanced capabilities. This is especially true for patterns containing ambiguous characters and positions and/or length ambiguities.

RESULTS

We have implemented the 3of5 web application in order to enable complex pattern matching in protein sequences. 3of5 is named after a special use of its main feature, the novel n-of-m pattern type. This feature allows for an extensive specification of variable patterns where the individual elements may vary in their position, order, and content within a defined stretch of sequence. The number of distinct elements can be constrained by operators, and individual characters may be excluded. The n-of-m pattern type can be combined with common regular expression terms and thus also allows for a comprehensive description of complex patterns. 3of5 increases the fidelity of pattern matching and finds ALL possible solutions in protein sequences in cases of length-ambiguous patterns instead of simply reporting the longest or shortest hits. Grouping and combined search for patterns provides a hierarchical arrangement of larger patterns sets. The algorithm is implemented as internet application and freely accessible. The application is available at http://dkfz.de/mga2/3of5/3of5.html.

CONCLUSION

The 3of5 application offers an extended vocabulary for the definition of search patterns and thus allows the user to comprehensively specify and identify peptide patterns with variable elements. The n-of-m pattern type offers an improved accuracy for pattern matching in combination with the ability to find all solutions, without compromising the user friendliness of regular expression terms.

摘要

背景

识别生物序列中的模式是基因组分析和蛋白质组学中的一项关键挑战。此类模式通常很复杂且高度可变,尤其是在蛋白质序列中。由于术语用户友好,它们经常使用正则表达式(RegEx)来描述。随着模式复杂性的增加,查询会出现局限性,并伴随着对增强功能的需求。对于包含模糊字符和位置及/或长度模糊性的模式尤其如此。

结果

我们实现了3of5网络应用程序,以便在蛋白质序列中实现复杂的模式匹配。3of5因其主要功能的一种特殊用途而得名,即新颖的n选m模式类型。此功能允许对可变模式进行广泛的指定,其中各个元素在定义的序列片段内的位置、顺序和内容可能会有所不同。不同元素的数量可以通过运算符进行约束,并且可以排除单个字符。n选m模式类型可以与常见的正则表达式术语相结合,从而也允许对复杂模式进行全面描述。3of5提高了模式匹配的保真度,并且在长度模糊的模式情况下能在蛋白质序列中找到所有可能的解决方案,而不是简单地报告最长或最短匹配。模式的分组和组合搜索提供了更大模式集的层次排列。该算法作为互联网应用程序实现,可免费访问。该应用程序可在http://dkfz.de/mga2/3of5/3of5.html获取。

结论

3of5应用程序为搜索模式的定义提供了扩展的词汇表,从而允许用户全面指定和识别具有可变元素的肽模式。n选m模式类型结合找到所有解决方案的能力,提高了模式匹配的准确性,同时又不影响正则表达式术语的用户友好性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dced/1523217/b9046580e3bd/1471-2105-7-144-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验