Department of Chemistry, Sri Sarada Niketan College for Women, Karur-639005, Tamil Nadu,India.
Department of Computer Science, Sri Sarada Niketan College for Women, Karur-639005, Tamil Nadu,India.
Comb Chem High Throughput Screen. 2022;25(3):365-380. doi: 10.2174/1386207324666210811101437.
Biological macromolecules, namely, DNA, RNA, and protein, have their building blocks organized in a particular sequence and the sequential arrangement encodes the evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by Multiple Sequence Algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using the numerical characterization of DNA sequences.
The graphical representation of DNA sequences by chaos game representation and other 2-dimensional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis are presented. The extension of computing molecular descriptors in chemometrics to the calculation of a new set of DNA invariants and their use in alignment-free sequence comparison in an N-dimensional space and construction of phylogenetic trees are also reviewed.
The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptidebased vaccines by combining numerical characterization and graphical representation.
生物大分子,即 DNA、RNA 和蛋白质,其构建块按特定序列排列,顺序排列编码了生物体(物种)的进化历史。因此,生物序列被用于研究物种之间的进化关系。这通常通过多序列算法(MSA)来完成。由于 MSA 的某些限制,开发了无需比对的序列比较方法。本综述是关于使用 DNA 序列的数值特征进行无需比对的序列比较方法。
讨论了通过混沌游戏表示和其他 2 维和 3 维方法对 DNA 序列进行图形表示。介绍了从各种图形表示形式演变而来的数值特征化,以及由此计算得出的 DNA 不变量在系统发育分析中的应用。还回顾了在化学计量学中计算分子描述符的扩展,以计算一组新的 DNA 不变量,并在 N 维空间中进行无需比对的序列比较以及构建系统发育树。
使用 DNA 不变量进行无需比对的序列比较方法构建的系统发育树比使用 PHLYIP 和 ClustalW 等基于比对的工具构建的系统发育树更好。其中一种图形表示方法现在已扩展用于研究传染病的病毒序列,以通过结合数值特征化和图形表示来识别保守区域,从而设计基于肽的疫苗。