论核酸相似性的统计学意义。

On the statistical significance of nucleic acid similarities.

作者信息

Lipman D J, Wilbur W J, Smith T F, Waterman M S

出版信息

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215-26. doi: 10.1093/nar/12.1part1.215.

DOI:10.1093/nar/12.1part1.215

PMID:6694902

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC320998/

Abstract

When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We demonstrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models which account for some of these known statistical properties. The utility of the method is demonstrated in evaluating high relative similarity scores in four specific cases in which there is little biological context by which to judge the similarities. In two of the cases we identify the statistical properties which are responsible for the apparent similarity. In the other two cases the statistical significance of the similarity persists even when the known statistical properties of sequences are modelled. For one of these cases biological significance is likely while the other case remains an enigma.

摘要

在通过常规方法评估核酸之间的序列相似性时，常常会在相似性的生物学意义存疑的情况下发现统计学显著性。我们证明，核酸序列的已知统计特性在通过标准程序计算时会强烈影响相似性值的统计分布。我们提出了一系列考虑了其中一些已知统计特性的模型。该方法的实用性在四个特定案例中得到了证明，在这些案例中几乎没有生物学背景来判断相似性，却出现了较高的相对相似性得分。在其中两个案例中，我们确定了导致明显相似性的统计特性。在另外两个案例中，即使对序列的已知统计特性进行了建模，相似性的统计学显著性仍然存在。在其中一个案例中，生物学意义可能存在，而另一个案例仍然是个谜。

相似文献

On the statistical significance of nucleic acid similarities.论核酸相似性的统计学意义。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215-26. doi: 10.1093/nar/12.1part1.215.

The statistical distribution of nucleic acid similarities.核酸相似性的统计分布。

Nucleic Acids Res. 1985 Jan 25;13(2):645-56. doi: 10.1093/nar/13.2.645.

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.使用通用评分方案评估分子序列特征统计显著性的方法。

Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264-8. doi: 10.1073/pnas.87.6.2264.

Use of statistical criteria for screening potential homologies in nucleic acid sequences.使用统计标准筛选核酸序列中的潜在同源性。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):203-13. doi: 10.1093/nar/12.1part1.203.

Informational parameters of nucleic acid and molecular evolution.核酸与分子进化的信息参数

J Theor Biol. 1988 Feb 7;130(3):351-61. doi: 10.1016/s0022-5193(88)80034-1.

On the statistical assessment of similarities in DNA sequences.关于DNA序列相似性的统计评估。

Nucleic Acids Res. 1984 Jul 11;12(13):5529-43. doi: 10.1093/nar/12.13.5529.

A novel sequence similarity searching and visualization method based on overlappingly translated nucleic acids: the blastNP.一种基于重叠翻译核酸的新型序列相似性搜索与可视化方法：blastNP。

Med Hypotheses. 2004;62(4):568-74. doi: 10.1016/j.mehy.2003.11.020.

Computation of statistical secondary structure of nucleic acids.核酸统计二级结构的计算。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):335-46. doi: 10.1093/nar/12.1part1.335.

Statistical analysis of DNA sequences.DNA序列的统计分析。

J Natl Cancer Inst. 1988 May 18;80(6):395-406. doi: 10.1093/jnci/80.6.395.

Comparative statistics for DNA and protein sequences: single sequence analysis.DNA和蛋白质序列的比较统计：单序列分析

Proc Natl Acad Sci U S A. 1985 Sep;82(17):5800-4. doi: 10.1073/pnas.82.17.5800.

引用本文的文献

The core and unique proteins of haloarchaea.嗜盐菌的核心和独特蛋白。

BMC Genomics. 2012 Jan 24;13:39. doi: 10.1186/1471-2164-13-39.

Target-decoy approach and false discovery rate: when things may go wrong.靶向诱饵方法和错误发现率：当事情可能出错时。

J Am Soc Mass Spectrom. 2011 Jul;22(7):1111-20. doi: 10.1007/s13361-011-0139-3. Epub 2011 May 5.

Measuring global credibility with application to local sequence alignment.用于局部序列比对的全局可信度度量。

PLoS Comput Biol. 2008 May 16;4(5):e1000077. doi: 10.1371/journal.pcbi.1000077.

Statistical distributions of optimal global alignment scores of random protein sequences.随机蛋白质序列最优全局比对得分的统计分布。

BMC Bioinformatics. 2005 Oct 15;6:257. doi: 10.1186/1471-2105-6-257.

Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe.逆转座子及其对RNA聚合酶II启动子的识别：来自粟酒裂殖酵母全基因组序列的转座元件综合研究

Genome Res. 2003 Sep;13(9):1984-97. doi: 10.1101/gr.1191603.

Massive sequence comparisons as a help in annotating genomic sequences.大规模序列比较有助于注释基因组序列。

Genome Res. 2001 Jul;11(7):1296-303. doi: 10.1101/gr.gr-1776r.

Riboregulation in Escherichia coli: DsrA RNA acts by RNA:RNA interactions at multiple loci.大肠杆菌中的核糖调控：DsrA RNA 通过在多个位点的 RNA:RNA 相互作用发挥作用。

Proc Natl Acad Sci U S A. 1998 Oct 13;95(21):12456-61. doi: 10.1073/pnas.95.21.12456.

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.利用可靠的结构鉴定远距离进化关系评估序列比较方法。

Proc Natl Acad Sci U S A. 1998 May 26;95(11):6073-8. doi: 10.1073/pnas.95.11.6073.

The conjugal transfer system of Agrobacterium tumefaciens octopine-type Ti plasmids is closely related to the transfer system of an IncP plasmid and distantly related to Ti plasmid vir genes.根癌土壤杆菌章鱼碱型Ti质粒的接合转移系统与IncP质粒的转移系统密切相关，而与Ti质粒的vir基因关系较远。

J Bacteriol. 1996 Jul;178(14):4248-57. doi: 10.1128/jb.178.14.4248-4257.1996.

Ti plasmid-encoded genes responsible for catabolism of the crown gall opine mannopine by Agrobacterium tumefaciens are homologs of the T-region genes responsible for synthesis of this opine by the plant tumor.根癌土壤杆菌中负责冠瘿碱甘露碱分解代谢的Ti质粒编码基因，是植物肿瘤中负责该冠瘿碱合成的T区域基因的同源物。

J Bacteriol. 1996 Jun;178(11):3275-84. doi: 10.1128/jb.178.11.3275-3284.1996.

本文引用的文献

Some rules in the ordering of nucleotides in the DNA.DNA中核苷酸排列的一些规则。

Nucleic Acids Res. 1980 Oct 10;8(19):4545-62. doi: 10.1093/nar/8.19.4545.

Strong adenine clustering in nucleotide sequences.核苷酸序列中强烈的腺嘌呤聚类。

J Theor Biol. 1980 Jul 21;85(2):285-91. doi: 10.1016/0022-5193(80)90021-1.

Identification of common molecular subsequences.常见分子子序列的鉴定

J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.

Recognition of protein coding regions in DNA sequences.DNA序列中蛋白质编码区域的识别。

Nucleic Acids Res. 1982 Sep 11;10(17):5303-18. doi: 10.1093/nar/10.17.5303.

Codon catalog usage and the genome hypothesis.密码子目录使用与基因组假说。

Nucleic Acids Res. 1980 Jan 11;8(1):r49-r62. doi: 10.1093/nar/8.1.197-c.

Random sequences.随机序列

J Mol Biol. 1983 Jan 15;163(2):171-6. doi: 10.1016/0022-2836(83)90002-5.

Statistical characterization of nucleic acid sequence functional domains.核酸序列功能域的统计学特征

Nucleic Acids Res. 1983 Apr 11;11(7):2205-20. doi: 10.1093/nar/11.7.2205.

Contextual constraints on synonymous codon choice.同义密码子选择的上下文限制

J Mol Biol. 1983 Jan 25;163(3):363-76. doi: 10.1016/0022-2836(83)90063-3.

Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries.核酸序列中的模式识别。I. 寻找局部同源性和对称性的通用方法。

Nucleic Acids Res. 1982 Jan 11;10(1):247-63. doi: 10.1093/nar/10.1.247.

A + T-rich linkers define functional domains in eukaryotic DNA.富含A + T的接头定义了真核生物DNA中的功能域。

Nature. 1982 Jan 21;295(5846):260-2. doi: 10.1038/295260a0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验