Basu K, Sriraam N, Richard R J A
Faculty of Information Technology, Multimedia University, 63100 Cyberjaya, Malaysia.
J Med Syst. 2007 Aug;31(4):247-53. doi: 10.1007/s10916-007-9062-3.
For a given DNA sequence, it is well known that pair wise alignment schemes are used to determine the similarity with the DNA sequences available in the databanks. The efficiency of the alignment decides the type of amino acids and its corresponding proteins. In order to evaluate the given DNA sequence for its proteomic identity, a pattern matching approach is proposed in this paper. A block based semi-global alignment scheme is introduced to determine the similarity between the DNA sequences (known and given). The two DNA sequences are divided into blocks of equal length and alignment is performed which minimizes the computational complexity. The efficiency of the alignment scheme is evaluated using the parameter, percentage of similarity (POS). Four essential DNA version of the amino acids that emphasize the importance of proteomic functionalities are chosen as patterns and matching is performed with the known and given DNA sequences to determine the similarity between them. The ratio of amino acid counts between the two sequences is estimated and the results are compared with that of the POS value. It is found from the experimental results that higher the POS value and the pattern matching higher are the similarity between the two DNA sequences. The optimal block is also identified based on the POS value and amino acids count.
对于给定的DNA序列,众所周知,成对排列方案用于确定与数据库中现有DNA序列的相似性。排列的效率决定了氨基酸及其相应蛋白质的类型。为了评估给定DNA序列的蛋白质组特性,本文提出了一种模式匹配方法。引入了一种基于块的半全局排列方案来确定DNA序列(已知序列和给定序列)之间的相似性。将两个DNA序列分成等长的块并进行排列,以最小化计算复杂度。使用相似性百分比(POS)参数评估排列方案的效率。选择四种强调蛋白质组功能重要性的氨基酸的基本DNA版本作为模式,并与已知DNA序列和给定DNA序列进行匹配,以确定它们之间的相似性。估计两个序列之间氨基酸计数的比率,并将结果与POS值进行比较。从实验结果发现,POS值越高且模式匹配度越高,两个DNA序列之间的相似性就越高。还根据POS值和氨基酸计数确定了最佳块。