Suppr超能文献

基因组聚类的信息论视角

Information theoretic perspective on genome clustering.

作者信息

Veluchamy Alaguraj, Mehta Preeti, Srividhya K V, Vikram Hirendra, Govind M K, Gupta Ramneek, Aziz Bin Dukhyil Abdul, Abdullah Alharbi Raed, Abdullah Aloyuni Saleh, Hassan Mohamed M, Krishnaswamy S

机构信息

Centre of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai 625021, India.

Department of Computational Biology, St. Jude Children's Research Hospital, Danny Thomas Place, Memphis 38105, Tennesse, United States of America.

出版信息

Saudi J Biol Sci. 2021 Mar;28(3):1867-1889. doi: 10.1016/j.sjbs.2020.12.039. Epub 2020 Dec 31.

Abstract

Shannon's information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff's second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of is found among two other genomes of prokaryotes viz. str. Cc () and Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.

摘要

香农的信息论通信视角有助于人们理解一维序列中信息的存储和处理。本文对937个可用的完全测序原核生物基因组和238条真核生物染色体进行了信息论分析。信息含量(Id)值被用于对这些染色体进行聚类。除了嗜菌属的霍奇金氏菌外,在所有基因组中都观察到了查加夫第二奇偶规则,即组成自互补性这一经验事实。在另外两个原核生物基因组,即嗜热栖热菌(Thermus thermophilus str. HB8)和鲁氏卡森氏菌(Carsonella ruddii PV)的所有14条染色体中,发现由于碱基组成偏向而产生的高信息含量。尽管存在大小和组成上的差异,但原核生物和真核生物基因组与等概率和随机情况相比并没有显著偏差。当使用基于简单距离的方法对生物体的真核染色体进行聚类时,会发现它们往往具有相似的信息限制。在真核生物中,在某些情况下,染色体的两条臂(p臂和q臂)的Id值也相似。本研究结果证实,信息含量可以为基因组聚类和基因组信息传递策略的进化提供见解。基于此信息论算法创建了一个高效且强大的Perl CGI独立工具,用于分析全基因组,该工具可在https://github.com/AlagurajVeluchamy/InformationTheory上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0353/7938122/cccbfabe5298/gr1a.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验