Suppr超能文献

MatInd和MatInspector:用于检测核苷酸序列数据中共有匹配的新型快速通用工具。

MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data.

作者信息

Quandt K, Frech K, Karas H, Wingender E, Werner T

机构信息

Institut für Säugetiergenetik, GSF-Forschungszentrum für Umwelt und Gesundheit GmbH, Neuherberg, Germany.

出版信息

Nucleic Acids Res. 1995 Dec 11;23(23):4878-84. doi: 10.1093/nar/23.23.4878.

Abstract

The identification of potential regulatory motifs in new sequence data is increasingly important for experimental design. Those motifs are commonly located by matches to IUPAC strings derived from consensus sequences. Although this method is simple and widely used, a major drawback of IUPAC strings is that they necessarily remove much of the information originally present in the set of sequences. Nucleotide distribution matrices retain most of the information and are thus better suited to evaluate new potential sites. However, sufficiently large libraries of pre-compiled matrices are a prerequisite for practical application of any matrix-based approach and are just beginning to emerge. Here we present a set of tools for molecular biologists that allows generation of new matrices and detection of potential sequence matches by automatic searches with a library of pre-compiled matrices. We also supply a large library (> 200) of transcription factor binding site matrices that has been compiled on the basis of published matrices as well as entries from the TRANSFAC database, with emphasis on sequences with experimentally verified binding capacity. Our search method includes position weighting of the matrices based on the information content of individual positions and calculates a relative matrix similarity. We show several examples suggesting that this matrix similarity is useful in estimating the functional potential of matrix matches and thus provides a valuable basis for designing appropriate experiments.

摘要

在新的序列数据中识别潜在的调控基序对于实验设计越来越重要。这些基序通常通过与源自共有序列的国际纯粹与应用化学联合会(IUPAC)字符串匹配来定位。尽管这种方法简单且被广泛使用,但IUPAC字符串的一个主要缺点是它们必然会去除序列集中原本存在的许多信息。核苷酸分布矩阵保留了大部分信息,因此更适合评估新的潜在位点。然而,足够大的预编译矩阵库是任何基于矩阵的方法实际应用的先决条件,并且才刚刚开始出现。在这里,我们为分子生物学家提供了一套工具,该工具允许生成新的矩阵,并通过使用预编译矩阵库进行自动搜索来检测潜在的序列匹配。我们还提供了一个大型库(> 200个)的转录因子结合位点矩阵,该矩阵是在已发表的矩阵以及TRANSFAC数据库条目的基础上编译而成的,重点是具有经实验验证的结合能力的序列。我们的搜索方法包括基于各个位置的信息含量对矩阵进行位置加权,并计算相对矩阵相似度。我们展示了几个例子,表明这种矩阵相似度在估计矩阵匹配的功能潜力方面很有用,从而为设计适当的实验提供了有价值的基础。

相似文献

2
MatInspector and beyond: promoter analysis based on transcription factor binding sites.
Bioinformatics. 2005 Jul 1;21(13):2933-42. doi: 10.1093/bioinformatics/bti473. Epub 2005 Apr 28.
3
Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.
Nucleic Acids Res. 1993 Apr 11;21(7):1655-64. doi: 10.1093/nar/21.7.1655.
5
EMQIT: a machine learning approach for energy based PWM matrix quality improvement.
Biol Direct. 2017 Aug 1;12(1):17. doi: 10.1186/s13062-017-0189-y.
6
7
Block searches on VAX and Alpha computer systems.
Comput Appl Biosci. 1993 Oct;9(5):587-91. doi: 10.1093/bioinformatics/9.5.587.
8
10
Binding matrix: a novel approach for binding site recognition.
J Bioinform Comput Biol. 2004 Jun;2(2):289-307. doi: 10.1142/s0219720004000569.

引用本文的文献

1
AMHY and sex determination in egg-laying mammals.
Genome Biol. 2025 May 27;26(1):144. doi: 10.1186/s13059-025-03546-1.
2
Chromatin and transcription in Nucleic Acids Research: the first 50 years.
Nucleic Acids Res. 2024 Dec 11;52(22):13485-13489. doi: 10.1093/nar/gkae1151.
7
ADAM10 mediates shedding of carbonic anhydrase IX ectodomain non‑redundantly to ADAM17.
Oncol Rep. 2023 Feb;49(2). doi: 10.3892/or.2022.8464. Epub 2022 Dec 16.
9
BLSSpeller to discover novel regulatory motifs in maize.
DNA Res. 2022 Jun 25;29(4). doi: 10.1093/dnares/dsac029.

本文引用的文献

2
Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.
Nucleic Acids Res. 1993 Apr 11;21(7):1655-64. doi: 10.1093/nar/21.7.1655.
3
SIGNAL SCAN 3.0: new database and program features.
Comput Appl Biosci. 1993 Feb;9(1):113-5. doi: 10.1093/bioinformatics/9.1.113.
4
Compilation of sequence-specific DNA-binding proteins implicated in transcriptional control in fungi.
Nucleic Acids Res. 1993 Dec 11;21(24):5537-46. doi: 10.1093/nar/21.24.5537.
7
Recognition of regulatory regions in genomic sequences.
J Biotechnol. 1994 Jun 30;35(2-3):273-80. doi: 10.1016/0168-1656(94)90041-8.
9
Computer methods to locate signals in nucleic acid sequences.
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505-19. doi: 10.1093/nar/12.1part2.505.
10
Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates.
Nucleic Acids Res. 1987 Feb 25;15(4):1353-61. doi: 10.1093/nar/15.4.1353.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验