Department of Computational Medicine, University of California, Los Angeles, California, USA.
Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, New York, USA.
J Bioinform Comput Biol. 2023 Dec;21(6):2350027. doi: 10.1142/S0219720023500270. Epub 2024 Jan 10.
Given several number sequences, determining the longest common subsequence is a classical problem in computer science. This problem has applications in bioinformatics, especially determining transposable genes. Nevertheless, related works only consider how to find one longest common subsequence. In this paper, we consider how to determine the uniqueness of the longest common subsequence. If there are multiple longest common subsequences, we also determine which number appears in all/some/none of the longest common subsequences. We focus on four scenarios: (1) linear sequences without duplicated numbers; (2) circular sequences without duplicated numbers; (3) linear sequences with duplicated numbers; (4) circular sequences with duplicated numbers. We develop corresponding algorithms and apply them to gene sequencing data.
给定几个数字序列,确定最长公共子序列是计算机科学中的一个经典问题。这个问题在生物信息学中有应用,特别是在确定可转座基因方面。然而,相关工作仅考虑如何找到一个最长公共子序列。在本文中,我们考虑如何确定最长公共子序列的唯一性。如果有多个最长公共子序列,我们还确定哪些数字出现在所有/一些/没有最长公共子序列中。我们专注于四个场景:(1)没有重复数字的线性序列;(2)没有重复数字的循环序列;(3)有重复数字的线性序列;(4)有重复数字的循环序列。我们开发了相应的算法,并将其应用于基因测序数据。