MatInd和MatInspector：用于检测核苷酸序列数据中共有匹配的新型快速通用工具。

MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data.

作者信息

Quandt K, Frech K, Karas H, Wingender E, Werner T

机构信息

Institut für Säugetiergenetik, GSF-Forschungszentrum für Umwelt und Gesundheit GmbH, Neuherberg, Germany.

出版信息

Nucleic Acids Res. 1995 Dec 11;23(23):4878-84. doi: 10.1093/nar/23.23.4878.

DOI:10.1093/nar/23.23.4878

PMID:8532532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC307478/

Abstract

The identification of potential regulatory motifs in new sequence data is increasingly important for experimental design. Those motifs are commonly located by matches to IUPAC strings derived from consensus sequences. Although this method is simple and widely used, a major drawback of IUPAC strings is that they necessarily remove much of the information originally present in the set of sequences. Nucleotide distribution matrices retain most of the information and are thus better suited to evaluate new potential sites. However, sufficiently large libraries of pre-compiled matrices are a prerequisite for practical application of any matrix-based approach and are just beginning to emerge. Here we present a set of tools for molecular biologists that allows generation of new matrices and detection of potential sequence matches by automatic searches with a library of pre-compiled matrices. We also supply a large library (> 200) of transcription factor binding site matrices that has been compiled on the basis of published matrices as well as entries from the TRANSFAC database, with emphasis on sequences with experimentally verified binding capacity. Our search method includes position weighting of the matrices based on the information content of individual positions and calculates a relative matrix similarity. We show several examples suggesting that this matrix similarity is useful in estimating the functional potential of matrix matches and thus provides a valuable basis for designing appropriate experiments.

摘要

在新的序列数据中识别潜在的调控基序对于实验设计越来越重要。这些基序通常通过与源自共有序列的国际纯粹与应用化学联合会（IUPAC）字符串匹配来定位。尽管这种方法简单且被广泛使用，但IUPAC字符串的一个主要缺点是它们必然会去除序列集中原本存在的许多信息。核苷酸分布矩阵保留了大部分信息，因此更适合评估新的潜在位点。然而，足够大的预编译矩阵库是任何基于矩阵的方法实际应用的先决条件，并且才刚刚开始出现。在这里，我们为分子生物学家提供了一套工具，该工具允许生成新的矩阵，并通过使用预编译矩阵库进行自动搜索来检测潜在的序列匹配。我们还提供了一个大型库（> 200个）的转录因子结合位点矩阵，该矩阵是在已发表的矩阵以及TRANSFAC数据库条目的基础上编译而成的，重点是具有经实验验证的结合能力的序列。我们的搜索方法包括基于各个位置的信息含量对矩阵进行位置加权，并计算相对矩阵相似度。我们展示了几个例子，表明这种矩阵相似度在估计矩阵匹配的功能潜力方面很有用，从而为设计适当的实验提供了有价值的基础。

相似文献

MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data.MatInd和MatInspector：用于检测核苷酸序列数据中共有匹配的新型快速通用工具。

Nucleic Acids Res. 1995 Dec 11;23(23):4878-84. doi: 10.1093/nar/23.23.4878.

MatInspector and beyond: promoter analysis based on transcription factor binding sites.MatInspector及其他：基于转录因子结合位点的启动子分析

Bioinformatics. 2005 Jul 1;21(13):2933-42. doi: 10.1093/bioinformatics/bti473. Epub 2005 Apr 28.

Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.计算机辅助预测、分类及界定核酸中的蛋白质结合位点

Nucleic Acids Res. 1993 Apr 11;21(7):1655-64. doi: 10.1093/nar/21.7.1655.

Improvement of TRANSFAC matrices using multiple local alignment of transcription factor binding site sequences.利用转录因子结合位点序列的多重局部比对改进TRANSFAC矩阵。

Genome Inform. 2005;16(1):68-72.

EMQIT: a machine learning approach for energy based PWM matrix quality improvement.EMQIT：一种基于能量的脉宽调制矩阵质量改进的机器学习方法。

Biol Direct. 2017 Aug 1;12(1):17. doi: 10.1186/s13062-017-0189-y.

A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis.一种适用于基因组DNA和蛋白质序列分析的具有动态阈值控制的点阵程序。

Gene. 1995 Dec 29;167(1-2):GC1-10. doi: 10.1016/0378-1119(95)00714-8.

Block searches on VAX and Alpha computer systems.在VAX和Alpha计算机系统上进行块搜索。

Comput Appl Biosci. 1993 Oct;9(5):587-91. doi: 10.1093/bioinformatics/9.5.587.

Computer tool FUNSITE for analysis of eukaryotic regulatory genomic sequences.用于分析真核生物调控基因组序列的计算机工具FUNSITE

Proc Int Conf Intell Syst Mol Biol. 1995;3:197-205.

FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory.FASTA-SWAP和FASTA-PAT：使用比对氨基酸组合进行模式数据库搜索以及一种新颖的评分理论。

J Mol Biol. 1996 Jun 21;259(4):840-54. doi: 10.1006/jmbi.1996.0362.

Binding matrix: a novel approach for binding site recognition.结合矩阵：一种用于结合位点识别的新方法。

J Bioinform Comput Biol. 2004 Jun;2(2):289-307. doi: 10.1142/s0219720004000569.

引用本文的文献

AMHY and sex determination in egg-laying mammals.AMHY与卵生哺乳动物的性别决定

Genome Biol. 2025 May 27;26(1):144. doi: 10.1186/s13059-025-03546-1.

Chromatin and transcription in Nucleic Acids Research: the first 50 years.《核酸研究》中的染色质与转录：头50年

Nucleic Acids Res. 2024 Dec 11;52(22):13485-13489. doi: 10.1093/nar/gkae1151.

The Abl1 tyrosine kinase is a key player in doxorubicin-induced cardiomyopathy and its p53/p73 cell death mediated signaling differs in atrial and ventricular cardiomyocytes.Abl1 酪氨酸激酶是多柔比星诱导性心肌病的关键因子，其介导的 p53/p73 细胞死亡信号在心房和心室心肌细胞中存在差异。

J Transl Med. 2024 Sep 16;22(1):845. doi: 10.1186/s12967-024-05623-8.

Identification of putative promoter elements for epsilon glutathione s-transferases genes associated with resistance to DDT in the malaria vector mosquito anopheles arabiensis.鉴定与阿拉伯按蚊对滴滴涕抗性相关的ε-谷胱甘肽S-转移酶基因的假定启动子元件。

Sci Afr. 2024 Mar;23:None. doi: 10.1016/j.sciaf.2023.e02047.

The cell functions of phospholipase C-1, Ca/H exchanger-1, and secretory phospholipase A in tolerance to stress conditions and cellulose degradation in Neurospora crassa.磷脂酶 C-1、Ca/H 交换器-1 和分泌型磷脂酶 A 在 Neurospora crassa 应激条件耐受和纤维素降解中的细胞功能。

Arch Microbiol. 2023 Sep 7;205(10):327. doi: 10.1007/s00203-023-03662-1.

Altered vitamin B12 metabolism in the central nervous system is associated with the modification of ribosomal gene expression: new insights from comparative RNA dataset analysis.中枢神经系统中维生素 B12 代谢的改变与核糖体基因表达的修饰有关：来自比较 RNA 数据集分析的新见解。

Funct Integr Genomics. 2023 Jan 23;23(1):45. doi: 10.1007/s10142-023-00969-6.

ADAM10 mediates shedding of carbonic anhydrase IX ectodomain non‑redundantly to ADAM17.ADAM10 介导碳酸酐酶 IX 胞外结构域非冗余裂解至 ADAM17。

Oncol Rep. 2023 Feb;49(2). doi: 10.3892/or.2022.8464. Epub 2022 Dec 16.

Mitochondrial stress induces AREG expression and epigenomic remodeling through c-JUN and YAP-mediated enhancer activation.线粒体应激通过 c-JUN 和 YAP 介导的增强子激活诱导 AREG 表达和表观基因组重塑。

Nucleic Acids Res. 2022 Sep 23;50(17):9765-9779. doi: 10.1093/nar/gkac735.

BLSSpeller to discover novel regulatory motifs in maize.BLSSpeller 用于发现玉米中的新型调控基序。

DNA Res. 2022 Jun 25;29(4). doi: 10.1093/dnares/dsac029.

The Associations of Selenoprotein Genetic Variants with the Risks of Colorectal Adenoma and Colorectal Cancer: Case-Control Studies in Irish and Czech Populations.硒蛋白遗传变异与结直肠腺瘤和结直肠癌风险的关联：爱尔兰和捷克人群的病例对照研究。

Nutrients. 2022 Jun 29;14(13):2718. doi: 10.3390/nu14132718.

本文引用的文献

TRANSFAC retrieval program: a network model database of eukaryotic transcription regulating sequences and proteins.TRANSFAC检索程序：真核生物转录调控序列和蛋白质的网络模型数据库。

J Comput Biol. 1994 Fall;1(3):191-8. doi: 10.1089/cmb.1994.1.191.

Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.计算机辅助预测、分类及界定核酸中的蛋白质结合位点

Nucleic Acids Res. 1993 Apr 11;21(7):1655-64. doi: 10.1093/nar/21.7.1655.

SIGNAL SCAN 3.0: new database and program features.信号扫描3.0：新数据库和程序功能。

Comput Appl Biosci. 1993 Feb;9(1):113-5. doi: 10.1093/bioinformatics/9.1.113.

Compilation of sequence-specific DNA-binding proteins implicated in transcriptional control in fungi.参与真菌转录调控的序列特异性DNA结合蛋白的汇编。

Nucleic Acids Res. 1993 Dec 11;21(24):5537-46. doi: 10.1093/nar/21.24.5537.

NF-Y controls transcription of the minute virus of mice P4 promoter through interaction with an unusual binding site.核因子Y通过与一个异常结合位点相互作用来控制小鼠微小病毒P4启动子的转录。

J Virol. 1995 Jan;69(1):239-46. doi: 10.1128/JVI.69.1.239-246.1995.

Defining the sequence specificity of the Saccharomyces cerevisiae DNA binding protein REB1p by selecting binding sites from random-sequence oligonucleotides.通过从随机序列寡核苷酸中选择结合位点来确定酿酒酵母DNA结合蛋白REB1p的序列特异性。

Yeast. 1994 Jun;10(6):771-87. doi: 10.1002/yea.320100608.

Recognition of regulatory regions in genomic sequences.基因组序列中调控区域的识别。

J Biotechnol. 1994 Jun 30;35(2-3):273-80. doi: 10.1016/0168-1656(94)90041-8.

Identification of a novel glucocorticoid response element within the genome of the human immunodeficiency virus type 1.在人类免疫缺陷病毒1型基因组中鉴定出一种新型糖皮质激素反应元件。

Virology. 1993 Jun;194(2):758-68. doi: 10.1006/viro.1993.1317.

Computer methods to locate signals in nucleic acid sequences.在核酸序列中定位信号的计算机方法。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505-19. doi: 10.1093/nar/12.1part2.505.

Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates.果蝇和脊椎动物翻译起始位点侧翼共有序列的比较。

Nucleic Acids Res. 1987 Feb 25;15(4):1353-61. doi: 10.1093/nar/15.4.1353.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验