• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种无需序列比对的序列集相似性度量方法。

A measure of the similarity of sets of sequences not requiring sequence alignment.

作者信息

Blaisdell B E

出版信息

Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9. doi: 10.1073/pnas.83.14.5155.

DOI:10.1073/pnas.83.14.5155
PMID:3460087
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC323909/
Abstract

Determination of first- and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms. These measures of the similarities of the distributions of adjacent pairs or triplets are in agreement with accepted evolutionary-tree topologies. Hierarchical clustering of the distributions of doublets of 30 miscellaneous coding sequences gives clusters in reasonable agreement with accepted biological classifications. In addition to similarity by homology, there is also observed similarity of disparate genes in the same organism--for example, all three disparate yeast genes (two enzymes and actin) form a well-distinguished cluster.

摘要

对有编码功能和无编码功能的真核细胞核DNA序列集进行一阶和二阶马尔可夫链同质性测定,结果发现,这些相似性是标准的Needleman-Wunsch碱基匹配算法或点阵算法无法察觉的。这些相邻碱基对或三联体分布相似性的测定结果与公认的进化树拓扑结构一致。对30个不同的编码序列的二联体分布进行层次聚类,得到的聚类结果与公认的生物学分类相当吻合。除了同源相似性外,还观察到同一生物体中不同基因间的相似性——例如,酵母中所有三个不同的基因(两种酶和肌动蛋白)形成一个明显不同的聚类。

相似文献

1
A measure of the similarity of sets of sequences not requiring sequence alignment.一种无需序列比对的序列集相似性度量方法。
Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9. doi: 10.1073/pnas.83.14.5155.
2
Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences.
J Mol Evol. 1989 Dec;29(6):526-37. doi: 10.1007/BF02602924.
3
Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding.马尔可夫链分析发现,在真核细胞核DNA序列(包括蛋白质编码序列和非编码序列)中,相邻碱基对某一碱基出现的概率有显著影响。
J Mol Evol. 1984;21(3):278-88. doi: 10.1007/BF02102360.
4
Comparative biosequence metrics.比较生物序列度量
J Mol Evol. 1981;18(1):38-46. doi: 10.1007/BF01733210.
5
Statistical measures of DNA sequence dissimilarity under Markov chain models of base composition.基于碱基组成马尔可夫链模型的DNA序列差异的统计度量。
Biometrics. 2001 Jun;57(2):441-8. doi: 10.1111/j.0006-341x.2001.00441.x.
6
Weighted relative entropy for alignment-free sequence comparison based on Markov model.基于马尔可夫模型的无比对序列比对的加权相对熵。
J Biomol Struct Dyn. 2011 Feb;28(4):545-55. doi: 10.1080/07391102.2011.10508594.
7
Sequence comparison by exponentially-damped alignment.通过指数衰减比对进行序列比较。
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):457-64. doi: 10.1093/nar/12.1part2.457.
8
Recognizing shorter coding regions of human genes based on the statistics of stop codons.基于终止密码子统计识别人类基因的较短编码区域。
Biopolymers. 2002 Mar;63(3):207-16. doi: 10.1002/bip.10054.
9
A method for multiple sequence alignment with gaps.一种带空位的多序列比对方法。
J Mol Biol. 1989 Oct 20;209(4):539-48. doi: 10.1016/0022-2836(89)90592-5.
10
Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems.对于各种计算机生成的模型系统,一种不需要序列比对的差异度量的平均值是需要序列比对的传统错配计数平均值的两倍。
J Mol Evol. 1991 Jun;32(6):521-8. doi: 10.1007/BF02102654.

引用本文的文献

1
Energy entropy vector: a novel approach for efficient microbial genomic sequence analysis and classification.能量熵向量:一种用于高效微生物基因组序列分析和分类的新方法。
Brief Bioinform. 2025 Sep 6;26(5). doi: 10.1093/bib/bbaf459.
2
CAKL: Commutative algebra k-mer learning of genomics.CAKL:基因组学的交换代数k-mer学习
ArXiv. 2025 Aug 13:arXiv:2508.09406v1.
3
Exploring the Promoter Generation and Prediction of spp. Based on GAN and Multi-Model Fusion Methods.基于生成对抗网络和多模型融合方法探索spp.的启动子生成与预测
Int J Mol Sci. 2024 Dec 6;25(23):13137. doi: 10.3390/ijms252313137.
4
Investigating alignment-free machine learning methods for HIV-1 subtype classification.研究用于HIV-1亚型分类的无比对机器学习方法。
Bioinform Adv. 2024 Jul 29;4(1):vbae108. doi: 10.1093/bioadv/vbae108. eCollection 2024.
5
Exploring geometry of genome space via Grassmann manifolds.通过格拉斯曼流形探索基因组空间的几何结构。
Innovation (Camb). 2024 Jul 22;5(5):100677. doi: 10.1016/j.xinn.2024.100677. eCollection 2024 Sep 9.
6
CAPE: a deep learning framework with Chaos-Attention net for Promoter Evolution.CAPE:用于启动子进化的具有混沌注意力网络的深度学习框架。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae398.
7
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
8
Enhancing Taxonomic Categorization of DNA Sequences with Deep Learning: A Multi-Label Approach.利用深度学习增强DNA序列的分类:一种多标签方法。
Bioengineering (Basel). 2023 Nov 8;10(11):1293. doi: 10.3390/bioengineering10111293.
9
Reference-free phylogeny from sequencing data.基于测序数据的无参考系统发育分析
BioData Min. 2023 Mar 27;16(1):13. doi: 10.1186/s13040-023-00329-x.
10
Compression-Complexity Measures for Analysis and Classification of Coronaviruses.用于冠状病毒分析和分类的压缩-复杂度度量
Entropy (Basel). 2022 Dec 31;25(1):81. doi: 10.3390/e25010081.

本文引用的文献

1
Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid.脱氧核糖核酸的酶促合成。VIII. 脱氧核糖核酸中相邻碱基序列的频率
J Biol Chem. 1961 Mar;236:864-75.
2
Nucleotide sequences of class-switch recombination region of the mouse immunoglobulin gamma 2b-chain gene.小鼠免疫球蛋白γ2b链基因类别转换重组区域的核苷酸序列。
Gene. 1980 Oct;11(1-2):117-27. doi: 10.1016/0378-1119(80)90092-x.
3
Complete nucleotide sequence of the human delta-globin gene.人类δ-珠蛋白基因的完整核苷酸序列。
Cell. 1980 Oct;21(3):639-46. doi: 10.1016/0092-8674(80)90427-4.
4
Human fetal G gamma- and A gamma-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes.人类胎儿Gγ-和Aγ-珠蛋白基因:完整的核苷酸序列表明,这些重复基因之间可发生DNA交换。
Cell. 1980 Oct;21(3):627-38. doi: 10.1016/0092-8674(80)90426-2.
5
The structure of a human alpha-globin pseudogene and its relationship to alpha-globin gene duplication.人类α-珠蛋白假基因的结构及其与α-珠蛋白基因重复的关系。
Cell. 1980 Sep;21(2):537-44. doi: 10.1016/0092-8674(80)90491-2.
6
The evolution of genes: the chicken preproinsulin gene.基因的进化:鸡的胰岛素原基因。
Cell. 1980 Jun;20(2):555-66. doi: 10.1016/0092-8674(80)90641-8.
7
Structural analysis of interspersed repetitive polymerase III transcription units in human DNA.人类DNA中散布的重复聚合酶III转录单元的结构分析。
Nucleic Acids Res. 1981 Mar 11;9(5):1151-70.
8
Isolation and sequence of the gene for actin in Saccharomyces cerevisiae.酿酒酵母肌动蛋白基因的分离与测序。
Proc Natl Acad Sci U S A. 1980 Jul;77(7):3912-6. doi: 10.1073/pnas.77.7.3912.
9
Codon catalog usage and the genome hypothesis.密码子目录使用与基因组假说。
Nucleic Acids Res. 1980 Jan 11;8(1):r49-r62. doi: 10.1093/nar/8.1.197-c.
10
The structure and evolution of the human beta-globin gene family.人类β-珠蛋白基因家族的结构与进化
Cell. 1980 Oct;21(3):653-68. doi: 10.1016/0092-8674(80)90429-8.