Suppr超能文献

通过一种新型元组搜索算法鉴定未比对核酸序列中的功能元件。

Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm.

作者信息

Wolfertstetter F, Frech K, Herrmann G, Werner T

机构信息

Institut für Säugetiergenetik, GSF-Forschungszentrum für Umwelt und Gesundheit GmbH, Oberschleiáheim, Germany.

出版信息

Comput Appl Biosci. 1996 Feb;12(1):71-80. doi: 10.1093/bioinformatics/12.1.71.

Abstract

We present an algorithm to identify potential functional elements like protein binding sites in DNA sequences, solely from nucleotide sequence data. Prerequisites are a set of at least seven not closely related sequences with a common biological function which is correlated to one or more unknown sequence elements present in most but not necessarily all of the sequences. The algorithm is based on a search for n-tuples which occur at least in a minimum percentage of the sequences with no or one mismatch, which may be at any position of the tuple. In contrast to functional tuples, random tuples show no preferred pattern of mismatch locations within the tuple nor is the conservation extended beyond the tuple. Both features of functional tuples are used to eliminate random tuples. Selection is carried out by maximization of the information content first for the n-tuple, then for a region containing the tuple and finally for the complete binding site. Further matches are found in an additional selection step, using the ConsInd method previously described. The algorithm is capable of identifying and delimiting elements (e.g. protein binding sites) represented by single short cores (e.g. TATA box) in sets of unaligned sequences of about 500 nucleotides using no information other than the nucleotide sequences. Furthermore, we show its ability to identify multiple elements in a set of complete LTR sequences (more than 600 nucleotides per sequence).

摘要

我们提出了一种算法,仅根据核苷酸序列数据来识别DNA序列中潜在的功能元件,如蛋白质结合位点。前提条件是要有一组至少七个没有密切亲缘关系且具有共同生物学功能的序列,该功能与大多数(但不一定是所有)序列中存在的一个或多个未知序列元件相关。该算法基于对n元组的搜索,这些n元组至少在一定比例的序列中出现,且没有错配或只有一个错配,错配可能出现在n元组的任何位置。与功能n元组不同,随机n元组在n元组内没有错配位置的偏好模式,而且保守性也不会超出n元组。功能n元组的这两个特征都用于消除随机n元组。选择过程首先通过最大化n元组中的信息含量来进行,然后是包含该n元组的区域,最后是完整的结合位点。在额外的选择步骤中,使用先前描述的ConsInd方法可以找到更多匹配项。该算法能够在大约500个核苷酸的未比对序列集中识别和界定由单个短核心(如TATA盒)代表的元件(如蛋白质结合位点),除了核苷酸序列外不使用其他任何信息。此外,我们展示了它在一组完整的长末端重复序列(每个序列超过600个核苷酸)中识别多个元件的能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验