Department of Biotechnology, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India.
SAR QSAR Environ Res. 2011 Mar;22(1-2):21-34. doi: 10.1080/1062936X.2010.528255.
Several alignment-free sequence comparison methods are available which use similarity, based on a particular numerical descriptor of biological sequences. Any loss of information incurred in the transformation of a sequence into a numerical descriptor affects the results. A pool of descriptors that use different algorithms in their computation is expected to suffer minimum loss of information and an attempt is made in this direction to study the similarity of DNA sequences. A number of descriptors based on information theory and connectivity were computed for DNA sequences. Principal component analysis (PCA) was used to extract minimum number (N) of orthogonal descriptors, principal components (PCs). Similarity/dissimilarity clustering of DNA sequences were carried out in the N-dimensional similarity space constructed using the PCs extracted from the DNA descriptors. The paper explains the extension of quantitative molecular similarity analysis (QMSA) from the prediction of physicochemical properties and toxicity of chemicals to bioinformatics for the classification of DNA sequences.
有几种基于相似性的无比对序列比较方法,这些方法使用特定的生物序列数值描述符。序列转换为数值描述符过程中产生的任何信息丢失都会影响结果。预计使用不同算法计算的描述符池将最小化信息丢失,因此尝试使用该方法研究 DNA 序列的相似性。计算了许多基于信息理论和连通性的描述符用于 DNA 序列。使用主成分分析(PCA)从 DNA 描述符中提取最小数量(N)的正交描述符,主成分(PC)。使用从 DNA 描述符中提取的 PC 构建 N 维相似性空间,对 DNA 序列进行相似/相异聚类。本文将定量分子相似性分析(QMSA)从化学物质的物理化学性质和毒性预测扩展到生物信息学,以对 DNA 序列进行分类。