在基因组序列比较的组成向量方法中，k-mer 的最佳选择。

Optimal choice of k-mer in composition vector method for genome sequence comparison.

机构信息

Department of Computer Science and Engineering, Narula Institute of Technology, Kolkata, India.

Department of Computer Science, Barasat College, Kolkata, India.

出版信息

Genomics. 2018 Sep;110(5):263-273. doi: 10.1016/j.ygeno.2017.11.003. Epub 2017 Nov 24.

DOI:10.1016/j.ygeno.2017.11.003

PMID:29180261

Abstract

Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.

摘要

几种蛋白质和基因是具有公共进化的家族成员。为了概述进化关系并识别保守模式，序列比较成为一种新兴的过程。目前的工作批判性地研究了 k-mer 在比较基因组序列的组成向量方法中的作用。一般来说，使用 k-mer 的组成向量方法是在选择不同的 k 值下应用于比较基因组序列。对于某些 k 值，结果是令人满意的，但对于其他 k 值，结果是不满意的。在提出的工作中，使用 3-mer 字符串长度进行标准组成向量方法。此外，使用基于信息的特殊类型相似性指数作为距离度量。它确定使用 3-mer 和基于信息的相似性指数提供了令人满意的结果，特别是对于所有情况下的整个基因组序列的比较。这些选择为比较基因组序列提供了一种统一的方法。

相似文献

Optimal choice of k-mer in composition vector method for genome sequence comparison.在基因组序列比较的组成向量方法中，k-mer 的最佳选择。

Genomics. 2018 Sep;110(5):263-273. doi: 10.1016/j.ygeno.2017.11.003. Epub 2017 Nov 24.

Genome classification improvements based on k-mer intervals in sequences.基于序列中 k-mer 间隔的基因组分类改进。

Genomics. 2019 Dec;111(6):1574-1582. doi: 10.1016/j.ygeno.2018.11.001. Epub 2018 Nov 13.

An improved string composition method for sequence comparison.一种用于序列比较的改进型字符串组成方法。

BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S15. doi: 10.1186/1471-2105-9-S6-S15.

KmerAperture: Retaining k-mer synteny for alignment-free extraction of core and accessory differences between bacterial genomes.KmerAperture：用于在无比对的情况下提取细菌基因组核心和辅助差异的 k-mer 同序性保留。

PLoS Genet. 2024 Apr 29;20(4):e1011184. doi: 10.1371/journal.pgen.1011184. eCollection 2024 Apr.

K-mer-Based Motif Analysis in Insect Species across , , and Genera and Its Application to Species Classification.基于 K- -mer 的昆虫种、属和科的基序分析及其在物种分类中的应用。

Comput Math Methods Med. 2019 Nov 15;2019:4259479. doi: 10.1155/2019/4259479. eCollection 2019.

Segmented K-mer and its application on similarity analysis of mitochondrial genome sequences.分段 K-mer 及其在线粒体基因组序列相似性分析中的应用。

Gene. 2013 Apr 15;518(2):419-24. doi: 10.1016/j.gene.2012.12.079. Epub 2013 Jan 23.

kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding.kmer2vec：一种基于 word2vec 嵌入的 DNA 序列比较新方法。

J Comput Biol. 2022 Sep;29(9):1001-1021. doi: 10.1089/cmb.2021.0536. Epub 2022 May 20.

KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences.KINN：一种基于生物序列中k-mer对的内部距离分布的无比对精确系统发育重建方法。

Mol Phylogenet Evol. 2023 Feb;179:107662. doi: 10.1016/j.ympev.2022.107662. Epub 2022 Nov 11.

K-mer natural vector and its application to the phylogenetic analysis of genetic sequences.K- -mer 自然向量及其在遗传序列系统发育分析中的应用。

Gene. 2014 Aug 1;546(1):25-34. doi: 10.1016/j.gene.2014.05.043. Epub 2014 May 22.

Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.雅典娜：使用语言模型自动调整基于 k-mer 的基因组纠错算法。

Sci Rep. 2019 Nov 6;9(1):16157. doi: 10.1038/s41598-019-52196-4.

引用本文的文献

An alignment-free method for phylogeny estimation using maximum likelihood.一种使用最大似然法进行系统发育估计的无比对方法。

BMC Bioinformatics. 2025 Mar 7;26(1):77. doi: 10.1186/s12859-025-06080-w.

Choice of Metric Divergence in Genome Sequence Comparison.基因组序列比较中的度量散度选择。

Protein J. 2024 Apr;43(2):259-273. doi: 10.1007/s10930-024-10189-x. Epub 2024 Mar 16.

ProtInteract: A deep learning framework for predicting protein-protein interactions.ProtInteract：一种用于预测蛋白质-蛋白质相互作用的深度学习框架。

Comput Struct Biotechnol J. 2023 Jan 25;21:1324-1348. doi: 10.1016/j.csbj.2023.01.028. eCollection 2023.

A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets.一种新的基于三核苷酸的图论方法来确定基因组序列的相似性。

Genomics. 2020 Nov;112(6):4701-4714. doi: 10.1016/j.ygeno.2020.08.023. Epub 2020 Aug 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在基因组序列比较的组成向量方法中，k-mer 的最佳选择。

Optimal choice of k-mer in composition vector method for genome sequence comparison.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献