IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1924-1936. doi: 10.1109/TCBB.2020.2967385. Epub 2021 Oct 7.
This paper introduces a novel alignment-free sequence analysis methodology. Its main idea is based on introducing a new representation of the DNA-Sequence. This representation breaks the dependency between the DNA bases that exist in the traditional string presentation. We called it the Four-Lists-Representation (FLR). Based on the FLR, a series of revolutionary algorithms for searching, map-discovery, similarity-score analysis, and similarity-visualization have been developed. They are combined in what we call the FLR Methodology. The paper also studies most of the available similarity analysis techniques in a comprehensive state-of-art review. The conducted extensive simulation and theoretical studies confirm the outperformance of the whole set of FLR-based algorithms in terms of speed and memory consumption in comparison to a long list of available similarity analysis algorithms. The ability to provide a similarity-map, similarity-score, and similarity-graph as a set of evidence-based rationales makes the quality of results provided by the proposed methodology presents a new edge in this field and promises a new area of genome-based research.
本文提出了一种新颖的无比对序列分析方法。其主要思想基于引入了一种新的 DNA 序列表示方式。这种表示方式打破了传统字符串表示中 DNA 碱基之间的依赖关系。我们称之为四列表表示法(FLR)。基于 FLR,我们开发了一系列用于搜索、图谱发现、相似度评分分析和相似度可视化的革命性算法。这些算法被组合成我们称之为 FLR 方法的整体。本文还对现有的大多数相似性分析技术进行了全面的文献综述。经过广泛的模拟和理论研究,证实了整套基于 FLR 的算法在速度和内存消耗方面优于一系列现有的相似性分析算法。能够提供相似度图谱、相似度评分和相似度图作为一组基于证据的推理,使得所提出的方法提供的结果质量在该领域呈现出一个新的优势,并为基于基因组的研究开辟了一个新的领域。