Suppr超能文献

主要DNA/RNA序列描述符的相互关联——一项初步研究

Intercorrelation of Major DNA/RNA Sequence Descriptors - A Preliminary Study.

作者信息

Sen Dwaipayan, Dasgupta Subhadeep, Pal Indrajit, Manna Smarajit, Basak Subhash C, Nandy Ashesh, Grunwald Gregory D

机构信息

Centre for Interdisciplinary Research and Education, Jodhpur Park, Kolkata 700068, India.

出版信息

Curr Comput Aided Drug Des. 2016;12(3):216-228. doi: 10.2174/1573409912666160525111918.

Abstract

UNLABELLED

A large number of alignment-free techniques of graphical representation and numerical characterization (GRANCH) of bio-molecular sequences have been proposed in the recent past years, but the relative efficacy of these methods in determining the degree of similarities and dissimilarities of such sequences have not been ascertained.

OBJECTIVE

Our objective is to make an assessment of the relative efficacy of these methods in determining the degree of similarities and dissimilarities of bio-molecular sequences.

METHOD

We have chosen 7 published/communicated methods that represent various classes of GRANCH techniques and computed the descriptors that are expected to characterize similarities and dissimilarities in several sets of gene sequences. We critically appraise the different methods and determine which of these yield non-redundant structural information that could be used to compute different properties of the sequences, and which are correlated enough to one another so that using the simplest representative of the group would suffice. We also do a principal component analysis (PCA) to determine how the variances in the calculated sequence descriptors are explained by the computed principal components (PCs).

RESULTS

We found that some of the descriptors are strongly correlated implying a commonality of structural information encoded by them while others are distinctly separate. The PCA results show that the first three PC's explain >97% of the variances.

CONCLUSION

We found that some mathematical DNA descriptors calculated by a few of these techniques correlate strongly with one another implying a redundancy in the structural information quantified by those descriptors; others are not strongly correlated with one another suggesting that they encode non-redundant sequence information. From this and our PCA results, our recommendation would be to use minimally correlated set of descriptors or orthogonal descriptors like PCs derived from the descriptor set for the characterization of nucleic acid structure and function.

摘要

未标注

近年来已经提出了大量用于生物分子序列图形表示和数值表征(GRANCH)的无比对技术,但这些方法在确定此类序列相似性和差异性程度方面的相对功效尚未得到确定。

目的

我们的目的是评估这些方法在确定生物分子序列相似性和差异性程度方面的相对功效。

方法

我们选择了7种已发表/交流的方法,这些方法代表了GRANCH技术的不同类别,并计算了预期用于表征几组基因序列中相似性和差异性的描述符。我们严格评估了不同的方法,确定哪些方法能产生可用于计算序列不同属性的非冗余结构信息,以及哪些方法彼此之间相关性足够强,以至于使用该组中最简单的代表就足够了。我们还进行了主成分分析(PCA),以确定计算出的主成分(PC)如何解释计算出的序列描述符中的方差。

结果

我们发现一些描述符高度相关,这意味着它们编码的结构信息具有共性,而其他描述符则明显不同。PCA结果表明,前三个PC解释了>97%的方差。

结论

我们发现,通过其中一些技术计算出的一些数学DNA描述符彼此之间高度相关,这意味着这些描述符量化的结构信息存在冗余;其他描述符彼此之间相关性不强,这表明它们编码的是非冗余序列信息。基于此以及我们的PCA结果,我们的建议是使用相关性最小的描述符集或正交描述符,如从描述符集中导出的PC,来表征核酸的结构和功能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验