Suppr超能文献

基于字分析的无比对基因序列比较:最新方法综述

Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

出版信息

Brief Bioinform. 2014 Nov;15(6):890-905. doi: 10.1093/bib/bbt052. Epub 2013 Jul 31.

Abstract

Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression.

摘要

现代测序和基因组组装技术提供了丰富的数据,这些数据很快将需要通过比较分析来发现。序列比对是生物信息学研究中的一项基本任务,但也存在一些注意事项。由于在处理大量序列数据时计算成本较高,动态规划的开创性技术和方法在这项工作中证明是无效的。由于遗传重组、遗传改组和其他内在的生物学事件,这些方法容易给出误导性信息。信息论、频率分析和数据压缩的新方法已经可用,并为动态规划提供了强大的替代方案。这些新方法通常更受欢迎,因为它们的算法更简单,不受同线性相关问题的影响。在这篇综述中,我们详细讨论了基于统计分析的基于无比对方法的计算工具。我们提供了几个清晰的例子,演示了无比对分析的几个不同领域的应用和解释,如碱基-碱基相关性、特征频率分布、组成向量、改进的字符串组成和 D2 统计量。此外,我们还详细讨论了数据压缩中 Lempel-Ziv 技术的分析示例。

相似文献

2
An improved string composition method for sequence comparison.一种用于序列比较的改进型字符串组成方法。
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S15. doi: 10.1186/1471-2105-9-S6-S15.
5
Cautionary Tales of Inapproximability.不可近似性的警示故事
J Comput Biol. 2017 Mar;24(3):213-216. doi: 10.1089/cmb.2016.0097. Epub 2016 Sep 8.
8
Alignment-free phylogenetics and population genetics.无比对系统发育学与群体遗传学。
Brief Bioinform. 2014 May;15(3):407-18. doi: 10.1093/bib/bbt083. Epub 2013 Nov 29.

引用本文的文献

2
The grand biological universe: A comprehensive geometric construction of genome space.宏大的生物宇宙:基因组空间的全面几何构建
Innovation (Camb). 2025 Apr 30;6(8):100937. doi: 10.1016/j.xinn.2025.100937. eCollection 2025 Aug 4.
5
The optimal metric for viral genome space.病毒基因组空间的最佳指标。
Comput Struct Biotechnol J. 2024 May 10;23:2083-2096. doi: 10.1016/j.csbj.2024.05.005. eCollection 2024 Dec.
6
Learning to Learn: How to Continuously Teach Humans and Machines.学会学习:如何持续教导人类和机器。
IEEE Int Conf Comput Vis Workshops. 2023 Oct;2023:11674-11685. doi: 10.1109/iccv51070.2023.01075. Epub 2024 Jan 15.

本文引用的文献

4
Evolutionary implications of horizontal gene transfer.水平基因转移的进化意义。
Annu Rev Genet. 2012;46:341-58. doi: 10.1146/annurev-genet-110711-155529. Epub 2012 Aug 29.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验