K2 和 K2*：基于 Kendall 统计量的高效无对齐序列相似性度量。

K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

机构信息

Department of Software engineering, College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350108, China.

Department of Computer Science & Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA.

出版信息

Bioinformatics. 2018 May 15;34(10):1682-1689. doi: 10.1093/bioinformatics/btx809.

DOI:10.1093/bioinformatics/btx809

PMID:29253072

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6355110/

Abstract

MOTIVATION

Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods.

RESULTS

We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes.

AVAILABILITY AND IMPLEMENTATION

The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz).

CONTACT

yueljiang@163.com.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

无比对序列比对方法可以比基于序列比对的方法更快地计算大量序列之间的两两相似度。

结果

我们提出了一种新的基于 Kendall 统计的无参数非比对序列比对方法，称为 K2。与其他最先进的无比对比较方法相比，K2 在生成系统发育树、评估功能相关调控序列以及计算序列之间的编辑距离（相似性/相异性）方面表现出了竞争力。此外，K2 方法比其他方法快得多。我们还提出了一种改进的方法 K2*，它能够自动确定适当的算法参数（长度），而无需先考虑不同的值。与最先进的无比对序列相似性方法的比较分析表明了所提出方法的优越性，尤其是随着序列长度的增加或数据集大小的增加。

可用性和实现

K2 和 K2* 方法以 R 语言实现为一个包，并可免费开放获取（http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz）。

联系人

yueljiang@163.com。

补充信息

补充数据可在 Bioinformatics 在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

K2 和 K2*：基于 Kendall 统计量的高效无对齐序列相似性度量。

K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系人

补充信息

相似文献

引用本文的文献

本文引用的文献

K2 和 K2*：基于 Kendall 统计量的高效无对齐序列相似性度量。

K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系人

补充信息

相似文献

引用本文的文献

本文引用的文献