基于 k-mer 统计的基因序列分析模型构建。

Gene sequence analysis model construction based on k-mer statistics.

机构信息

School of Mathematics and Statistics, Heze University, Heze, China.

出版信息

PLoS One. 2024 Sep 12;19(9):e0306480. doi: 10.1371/journal.pone.0306480. eCollection 2024.

DOI:10.1371/journal.pone.0306480

PMID:39264950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11392344/

Abstract

With the rapid development of biotechnology, gene sequencing methods are gradually improved. The structure of gene sequences is also more complex. However, the traditional sequence alignment method is difficult to deal with the complex gene sequence alignment work. In order to improve the efficiency of gene sequence analysis, D2 series method of k-mer statistics is selected to build the model of gene sequence alignment analysis. According to the structure of the foreground sequence, the sequence to be aligned can be cut by different lengths and divided into multiple subsequences. Finally, according to the selected subsequences, the maximum dissimilarity in the alignment results is determined as the statistical result. At the same time, the research also designed an application system for the sequence alignment analysis of the model. The experimental results showed that the statistical power of the sequence alignment analysis model was directly proportional to the sequence coverage and cutting length, and inversely proportional to the K value and module length. At the same time, the model was applied to the system designed in this paper. The maximum storage capacity of the system was 71 GB, the maximum disk capacity was 135 GB, and the running time was less than 2.0s. Therefore, the k-mer statistic sequence alignment model and system proposed in this study have considerable application value in gene alignment analysis.

摘要

随着生物技术的飞速发展，基因测序方法逐渐得到改进。基因序列的结构也更加复杂。然而，传统的序列比对方法很难处理复杂的基因序列比对工作。为了提高基因序列分析的效率，选择了 D2 系列的 k-mer 统计方法来构建基因序列比对分析模型。根据前景序列的结构，可以通过不同的长度切割待比对的序列，并将其分为多个子序列。最后，根据所选的子序列，确定对齐结果中的最大不相似性作为统计结果。同时，研究还设计了一个模型的序列比对分析应用系统。实验结果表明，序列比对分析模型的统计能力与序列覆盖率和切割长度成正比，与 K 值和模块长度成反比。同时，该模型应用于本文设计的系统中。该系统的最大存储容量为 71GB，最大磁盘容量为 135GB，运行时间小于 2.0s。因此，本研究提出的 k-mer 统计序列比对模型和系统在基因比对分析中具有相当大的应用价值。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于 k-mer 统计的基因序列分析模型构建。

Gene sequence analysis model construction based on k-mer statistics.

机构信息

出版信息

相似文献

本文引用的文献

基于 k-mer 统计的基因序列分析模型构建。

Gene sequence analysis model construction based on k-mer statistics.

机构信息

出版信息

相似文献

本文引用的文献