• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PIDA:一种用于模式识别的新算法。

PIDA:A new algorithm for pattern identification.

作者信息

Putonti C, Pettitt Bm, Reid Jg, Fofanov Y

机构信息

Department of Computer Science, University of Houston, Houston, Texas, USA.

出版信息

Online J Bioinform. 2007 Jan 1;8(1):30-40.

PMID:19834570
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2761635/
Abstract

Algorithms for motif identification in sequence space have predominately been focused on recognizing patterns of a fixed length containing regions of perfect conservation with possible regions of unconstrained sequence. Such motifs can be found in everything from proteins with distinct active sites to non-coding RNAs with specific structural elements that are necessary to maintain functionality. In the event that an insertion/deletion has occurred within an unconstrained portion of the pattern, it is possible that the pattern retains its functionality. In such a case the length of the pattern is now variable and may be overlooked when utilizing existing motif detection methods. The Pattern Island Detection Algorithm (PIDA) presented here has been developed to recognize patterns that have occurrences of varying length within sequences of any size alphabet. PIDA works by identifying all regions of perfect conservation (for lengths longer than a user-specified threshold), and then builds those conservation "islands" into fixed-length patterns. Next the algorithm modifies these fixed-length patterns by identifying additional (and different) islands that can be incorporated into each pattern through insertions/deletions within the "water" separating the islands. To provide some benchmarks for this analysis, PIDA was used to search for patterns within randomly generated sequences as well as sequences known to contain conserved patterns. For each of the patterns found, the statistical significance is calculated based upon the pattern's likelihood to appear by chance, thus providing a means to determine those patterns which are likely to have a functional role. The PIDA approach to motif finding is designed to perform best when searching for patterns of variable length although it is also able to identify patterns of a fixed length. PIDA has been created to be as generally applicable as possible since there are a variety of sequence problems of this type. The algorithm was implemented in C++ and is freely available upon request from the authors.

摘要

序列空间中基序识别算法主要集中于识别固定长度的模式,这些模式包含完全保守区域以及可能的无约束序列区域。此类基序存在于从具有独特活性位点的蛋白质到具有维持功能所需特定结构元件的非编码RNA等各种生物分子中。如果在模式的无约束部分发生了插入/缺失,该模式仍有可能保留其功能。在这种情况下,模式的长度现在是可变的,使用现有的基序检测方法时可能会被忽略。本文提出的模式岛检测算法(PIDA)旨在识别任意大小字母表序列中存在的长度可变的模式。PIDA的工作原理是识别所有完全保守区域(长度超过用户指定阈值),然后将这些保守“岛”构建成固定长度的模式。接下来,该算法通过识别可以通过分隔这些岛的“水域”中的插入/缺失纳入每个模式的其他(且不同的)岛来修改这些固定长度的模式。为了给该分析提供一些基准,PIDA被用于在随机生成的序列以及已知包含保守模式的序列中搜索模式。对于找到的每个模式,基于该模式偶然出现的可能性计算统计显著性,从而提供一种确定那些可能具有功能作用的模式的方法。PIDA寻找基序的方法旨在在搜索可变长度模式时表现最佳,尽管它也能够识别固定长度的模式。由于存在多种此类序列问题,PIDA已被设计为尽可能具有广泛适用性。该算法用C++实现,可根据作者要求免费获取。

相似文献

1
PIDA:A new algorithm for pattern identification.PIDA:一种用于模式识别的新算法。
Online J Bioinform. 2007 Jan 1;8(1):30-40.
2
Motif discoveries in unaligned molecular sequences using self-organizing neural networks.使用自组织神经网络在未比对分子序列中发现基序
IEEE Trans Neural Netw. 2006 Jul;17(4):919-928. doi: 10.1109/TNN.2006.875987.
3
rMotifGen: random motif generator for DNA and protein sequences.rMotifGen:用于DNA和蛋白质序列的随机基序生成器。
BMC Bioinformatics. 2007 Aug 7;8:292. doi: 10.1186/1471-2105-8-292.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Bipartite pattern discovery by entropy minimization-based multiple local alignment.基于熵最小化的多序列局部比对发现二分模式
Nucleic Acids Res. 2004 Sep 23;32(17):4979-91. doi: 10.1093/nar/gkh825. Print 2004.
6
Detection of generic spaced motifs using submotif pattern mining.使用子基序模式挖掘检测通用间隔基序
Bioinformatics. 2007 Jun 15;23(12):1476-85. doi: 10.1093/bioinformatics/btm118. Epub 2007 May 5.
7
Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.原核生物基因组中间隔基序的无监督统计发现。
BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.
8
Using SCOPE to identify potential regulatory motifs in coregulated genes.使用SCOPE鉴定共调控基因中的潜在调控基序。
J Vis Exp. 2011 May 31(51):2703. doi: 10.3791/2703.
9
Active motif finder - a bio-tool based on mutational structures in DNA sequences.活性基序查找器——一种基于DNA序列突变结构的生物工具。
J Biomed Res. 2011 Nov;25(6):444-8. doi: 10.1016/S1674-8301(11)60059-6.
10
The 3of5 web application for complex and comprehensive pattern matching in protein sequences.用于蛋白质序列中复杂全面模式匹配的3of5网络应用程序。
BMC Bioinformatics. 2006 Mar 16;7:144. doi: 10.1186/1471-2105-7-144.

本文引用的文献

1
Identification of consensus RNA secondary structures using suffix arrays.使用后缀数组识别共有RNA二级结构。
BMC Bioinformatics. 2006 May 5;7:244. doi: 10.1186/1471-2105-7-244.
2
CMfinder--a covariance model based RNA motif finding algorithm.CMfinder——一种基于协方差模型的RNA基序查找算法。
Bioinformatics. 2006 Feb 15;22(4):445-52. doi: 10.1093/bioinformatics/btk008. Epub 2005 Dec 15.
3
A method for aligning RNA secondary structures and its application to RNA motif detection.一种用于比对RNA二级结构的方法及其在RNA基序检测中的应用。
BMC Bioinformatics. 2005 Apr 7;6:89. doi: 10.1186/1471-2105-6-89.
4
Assessing computational tools for the discovery of transcription factor binding sites.评估用于发现转录因子结合位点的计算工具。
Nat Biotechnol. 2005 Jan;23(1):137-44. doi: 10.1038/nbt1053.
5
Logos: a modular bayesian model for de novo motif detection.Logos:一种用于从头基序检测的模块化贝叶斯模型。
J Bioinform Comput Biol. 2004 Mar;2(1):127-54. doi: 10.1142/s0219720004000508.
6
RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences.RNAProfile:一种用于在未比对的RNA序列中寻找保守二级结构基序的算法。
Nucleic Acids Res. 2004 Jun 15;32(10):3258-69. doi: 10.1093/nar/gkh650. Print 2004.
7
A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences.一种用于预测未比对序列中包括假结在内的常见RNA二级结构基序的图论方法。
Bioinformatics. 2004 Jul 10;20(10):1591-602. doi: 10.1093/bioinformatics/bth131. Epub 2004 Feb 12.
8
Genome wide identification of regulatory motifs in Bacillus subtilis.枯草芽孢杆菌中调控基序的全基因组鉴定
BMC Bioinformatics. 2003 May 16;4:18. doi: 10.1186/1471-2105-4-18.
9
Identification of the binding sites of regulatory proteins in bacterial genomes.细菌基因组中调控蛋白结合位点的鉴定。
Proc Natl Acad Sci U S A. 2002 Sep 3;99(18):11772-7. doi: 10.1073/pnas.112341999. Epub 2002 Aug 14.
10
TFBS: Computational framework for transcription factor binding site analysis.TFBS:转录因子结合位点分析的计算框架。
Bioinformatics. 2002 Aug;18(8):1135-6. doi: 10.1093/bioinformatics/18.8.1135.