序列的启发式信息分析

Heuristic informational analysis of sequences.

作者信息

Claverie J M, Bougueleret L

出版信息

Nucleic Acids Res. 1986 Jan 10;14(1):179-96. doi: 10.1093/nar/14.1.179.

DOI:10.1093/nar/14.1.179

PMID:3753763

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC339368/

Abstract

Nucleotide or amino-acid sequences are interpreted as successions of words of length k (k-tuples) the frequencies of which are highly variable in different statistical populations of genes or proteins. After building k-tuple reference tables from coherent subsets or entire data banks, the local information content profile of individual sequences is drawn. Anomalous regions (peaks or depressions) of such a profile can lead to the discovery and identification of specific sequence patterns. Along the same principle, the simultaneous use of two reference statistical populations and the computation of an index combining the two information profiles lead to a general and powerful discriminant analysis methods. The identification of a "signal" associated with gene conversion, the introns/exons discrimination and the location of function specific patterns in proteins are given as examples of successful applications of this heuristic informational approach.

摘要

核苷酸或氨基酸序列被解释为长度为k（k元组）的单词序列，其频率在不同的基因或蛋白质统计群体中高度可变。在从连贯子集或整个数据库构建k元组参考表之后，绘制单个序列的局部信息含量图谱。这种图谱的异常区域（峰值或凹陷）可导致发现和识别特定的序列模式。基于相同的原理，同时使用两个参考统计群体并计算结合两个信息图谱的指数，可得出一种通用且强大的判别分析方法。与基因转换相关的“信号”识别、内含子/外显子区分以及蛋白质中功能特异性模式的定位，均作为这种启发式信息方法成功应用的示例给出。

相似文献

Heuristic informational analysis of sequences.序列的启发式信息分析

Nucleic Acids Res. 1986 Jan 10;14(1):179-96. doi: 10.1093/nar/14.1.179.

Rapid searches for complex patterns in biological molecules.快速搜索生物分子中的复杂模式。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80. doi: 10.1093/nar/12.1part1.263.

Objective comparison of exon and intron sequences by means of 2-dimensional data analysis methods.通过二维数据分析方法对外显子和内含子序列进行客观比较。

Nucleic Acids Res. 1988 Mar 11;16(5):1729-38. doi: 10.1093/nar/16.5.1729.

A statistical analysis of nucleotide sequences of introns and exons in human genes.人类基因中外显子和内含子核苷酸序列的统计分析。

Mol Biol Evol. 1987 Jul;4(4):395-405. doi: 10.1093/oxfordjournals.molbev.a040453.

Algorithms for the search of amino acid patterns in nucleic acid sequences.用于在核酸序列中搜索氨基酸模式的算法。

Nucleic Acids Res. 1986 Jan 10;14(1):99-107. doi: 10.1093/nar/14.1.99.

Personal access to sequence databases on personal computers.个人通过个人电脑访问序列数据库。

Nucleic Acids Res. 1986 Jan 10;14(1):611-9. doi: 10.1093/nar/14.1.611.

[Neg-entropy and local information content: a new approach to the analysis of nucleotide and amino-acid sequences].[负熵与局部信息含量：核苷酸和氨基酸序列分析的一种新方法]

Tanpakushitsu Kakusan Koso. 1986 Jun(29 Suppl):111-22.

MULTAN: a program to align multiple DNA sequences.MULTAN：一个用于比对多个DNA序列的程序。

Nucleic Acids Res. 1986 Jan 10;14(1):159-77. doi: 10.1093/nar/14.1.159.

Utilization of sequence libraries on a 16-bit mini computer with particular reference to high speed searching.在16位小型计算机上对序列文库的利用，特别涉及高速搜索。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):409-16. doi: 10.1093/nar/12.1part1.409.

Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.

引用本文的文献

A Puzzling Anomaly in the 4-Mer Composition of the Giant Pandoravirus Genomes Reveals a Stringent New Evolutionary Selection Process.巨潘多拉病毒基因组四聚体组成中的一个令人费解的异常现象揭示了一个严格的新进化选择过程。

J Virol. 2019 Nov 13;93(23). doi: 10.1128/JVI.01206-19. Print 2019 Dec 1.

A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package.一种基于频率的蛋白质解码与设计语言方法：简单概念、多样应用及SCS软件包

Comput Struct Biotechnol J. 2013 Mar 29;5:e201302010. doi: 10.5936/csbj.201302010. eCollection 2013.

Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach.蛋白质氨基酸序列的词法解码与可用性分析：一种语言学法。

PLoS One. 2012;7(11):e50039. doi: 10.1371/journal.pone.0050039. Epub 2012 Nov 21.

Visualization of the protein-coding regions with a self adaptive spectral rotation approach.采用自适应光谱旋转方法可视化编码蛋白区域。

Nucleic Acids Res. 2011 Jan;39(1):e3. doi: 10.1093/nar/gkq891. Epub 2010 Oct 14.

Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences.通过同时表征DNA序列的周期性和随机特征来鉴定蛋白质编码序列。

J Biomed Biotechnol. 2005 Jun 30;2005(2):139-46. doi: 10.1155/JBB.2005.139.

Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions.基于光谱旋转测量的基因预测：一种识别蛋白质编码区域的新方法。

Genome Res. 2003 Aug;13(8):1930-7. doi: 10.1101/gr.1261703. Epub 2003 Jul 17.

A computational analysis of sequence features involved in recognition of short introns.对参与短内含子识别的序列特征的计算分析。

Proc Natl Acad Sci U S A. 2001 Sep 25;98(20):11193-8. doi: 10.1073/pnas.201407298.

Computational methods for exon detection.外显子检测的计算方法。

Mol Biotechnol. 1998 Aug;10(1):27-48. doi: 10.1007/BF02745861.

Self-identification of protein-coding regions in microbial genomes.微生物基因组中蛋白质编码区域的自我识别。

Proc Natl Acad Sci U S A. 1998 Aug 18;95(17):10026-31. doi: 10.1073/pnas.95.17.10026.

Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.基因组DNA序列中编码区域的识别：动态规划和神经网络的应用

Nucleic Acids Res. 1993 Feb 11;21(3):607-13. doi: 10.1093/nar/21.3.607.

本文引用的文献

The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression.密码子偏好性图：蛋白质编码序列的图形分析及基因表达预测

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):539-49. doi: 10.1093/nar/12.1part2.539.

New approaches for computer analysis of nucleic acid sequences.核酸序列计算机分析的新方法。

Proc Natl Acad Sci U S A. 1983 Sep;80(18):5660-4. doi: 10.1073/pnas.80.18.5660.

Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.

BIOLOG - a DNA sequence analysis system in PROLOG.BIOLOG——一种用PROLOG语言编写的DNA序列分析系统。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):633-42. doi: 10.1093/nar/12.1part2.633.

A common philosophy and FORTRAN 77 software package for implementing and searching sequence databases.一个用于实现和搜索序列数据库的通用理念及 FORTRAN 77 软件包。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):397-407. doi: 10.1093/nar/12.1part1.397.

Signal search analysis: a new method to localize and characterize functionally important DNA sequences.信号搜索分析：一种定位和表征功能重要DNA序列的新方法。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):287-305. doi: 10.1093/nar/12.1part1.287.

Fast computer search for similar DNA sequences.利用计算机快速搜索相似的DNA序列。

Nucleic Acids Res. 1984 Jul 11;12(13):5471-4. doi: 10.1093/nar/12.13.5471.

Genetic exchanges between partially homologous nucleotide sequences: possible implications for multigene families.部分同源核苷酸序列之间的基因交换：对多基因家族的可能影响。

Biochimie. 1983 Feb;65(2):85-93. doi: 10.1016/s0300-9084(83)80178-3.

Graphic methods to determine the function of nucleic acid sequences.用于确定核酸序列功能的图解方法。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):521-38. doi: 10.1093/nar/12.1part2.521.

Protein and Nucleic Acid Sequence Database Systems.蛋白质和核酸序列数据库系统

Annu Rev Biophys Bioeng. 1983;12:419-41. doi: 10.1146/annurev.bb.12.060183.002223.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验