短 k-字线性回归模型：一种适用于各种长度生物序列的相似性距离。

Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths.

机构信息

School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, PR China; School of Mathematics, Liaoning Normal University, Dalian, Liaoning 116029, PR China.

School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, PR China.

出版信息

J Theor Biol. 2013 Nov 21;337:61-70. doi: 10.1016/j.jtbi.2013.07.028. Epub 2013 Aug 8.

DOI:10.1016/j.jtbi.2013.07.028

PMID:23933105

Abstract

Originating from sequences' length difference, both k-word based methods and graphical representation approaches have uncovered biological information in their distinct ways. However, it is less likely that the mechanisms of information storage vary with sequences' length. A similarity distance suitable for sequences with various lengths will be much near to the mechanisms of information storage. In this paper, new sub-sequences of k-word were extracted from biological sequences under a one-to-one mapping. The new sub-sequences were evaluated by a linear regression model. Moreover, a new distance was defined on the invariants from the linear regression model. With comparison to other alignment-free distances, the results of four experiments demonstrated that our similarity distance was more efficient.

摘要

源于序列长度差异，基于 k 字的方法和图形表示方法以不同的方式揭示了生物学信息。然而，信息存储的机制不太可能随序列长度而变化。适合各种长度序列的相似距离将更接近信息存储的机制。在本文中，从生物序列中以一对一映射的方式提取了新的 k 字子序列。通过线性回归模型评估新的子序列。此外，在线性回归模型的不变量上定义了新的距离。与其他无比对距离相比，四个实验的结果表明，我们的相似距离更为有效。

相似文献

Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths.短 k-字线性回归模型：一种适用于各种长度生物序列的相似性距离。

J Theor Biol. 2013 Nov 21;337:61-70. doi: 10.1016/j.jtbi.2013.07.028. Epub 2013 Aug 8.

A novel statistical measure for sequence comparison on the basis of k-word counts.基于 k 字计数的序列比较的一种新的统计度量。

J Theor Biol. 2013 Feb 7;318:91-100. doi: 10.1016/j.jtbi.2012.10.035. Epub 2012 Nov 9.

Nucleosides Nucleotides Nucleic Acids. 2010 Feb;29(2):123-31. doi: 10.1080/15257771003597766.

Alignment free comparison: similarity distribution between the DNA primary sequences based on the shortest absent word.无比对：基于最短缺失字的 DNA 一级序列相似性分布。

J Theor Biol. 2012 Feb 21;295:125-31. doi: 10.1016/j.jtbi.2011.11.021. Epub 2011 Dec 1.

Use of mitogenomic information in teleostean molecular phylogenetics: a tree-based exploration under the maximum-parsimony optimality criterion.有丝分裂基因组信息在硬骨鱼分子系统发育学中的应用：基于最大简约性最优标准的树形探索

Mol Phylogenet Evol. 2000 Dec;17(3):437-55. doi: 10.1006/mpev.2000.0839.

A simple method to analyze the similarity of biological sequences based on the fuzzy theory.一种基于模糊理论分析生物序列相似性的简单方法。

J Theor Biol. 2010 Aug 7;265(3):323-8. doi: 10.1016/j.jtbi.2010.05.008. Epub 2010 May 18.

Complete mitochondrial genome of the bullhead torrent catfish, Liobagrus obesus (Siluriformes, Amblycipididae): Genome description and phylogenetic considerations inferred from the Cyt b and 16S rRNA genes.钝头鮡（Liobagrus obesus）（鲇形目，钝头鮠科）的线粒体全基因组：基于细胞色素b和16S rRNA基因的基因组描述及系统发育分析

Gene. 2007 Jul 1;396(1):13-27. doi: 10.1016/j.gene.2007.01.027. Epub 2007 Feb 12.

A simple k-word interval method for phylogenetic analysis of DNA sequences.一种简单的 K 字区间方法用于 DNA 序列的系统发育分析。

J Theor Biol. 2013 Jan 21;317:192-9. doi: 10.1016/j.jtbi.2012.10.010. Epub 2012 Oct 18.

Alignment free comparison: k word voting model and its applications.无比对信息的比较：k 字投票模型及其应用。

J Theor Biol. 2013 Oct 21;335:276-82. doi: 10.1016/j.jtbi.2013.06.037. Epub 2013 Jul 10.

The ribosomal RNA gene region in Acanthamoeba castellanii mitochondrial DNA. A case of evolutionary transfer of introns between mitochondria and plastids?卡氏棘阿米巴线粒体DNA中的核糖体RNA基因区域。线粒体与质体之间内含子发生进化转移的一个实例？

J Mol Biol. 1994 Jun 17;239(4):476-99. doi: 10.1006/jmbi.1994.1390.

引用本文的文献

Non-standard bioinformatics characterization of SARS-CoV-2.非标准生物信息学 SARS-CoV-2 特征分析。

Comput Biol Med. 2021 Apr;131:104247. doi: 10.1016/j.compbiomed.2021.104247. Epub 2021 Feb 1.

A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector.基于新型组代表向量的蛋白质序列统计相似/相异分析。

Biomed Res Int. 2019 May 8;2019:8702968. doi: 10.1155/2019/8702968. eCollection 2019.

One novel representation of DNA sequence based on the global and local position information.基于全局和局部位置信息的 DNA 序列的一种新表示。

Sci Rep. 2018 May 15;8(1):7592. doi: 10.1038/s41598-018-26005-3.

Circular Helix-Like Curve: An Effective Tool of Biological Sequence Analysis and Comparison.环形螺旋状曲线：生物序列分析与比较的有效工具

Comput Math Methods Med. 2016;2016:3262813. doi: 10.1155/2016/3262813. Epub 2016 Jun 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

短 k-字线性回归模型：一种适用于各种长度生物序列的相似性距离。

Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献