• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

来自完整基因组的蛋白质序列的多重分形和相关性分析。

Multifractal and correlation analyses of protein sequences from complete genomes.

作者信息

Yu Zu-Guo, Anh Vo, Lau Ka-Sing

机构信息

Program in Statistics and Operations Research, Queensland University of Technology, GPO Box 2434, Brisbane Q4001, Australia.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Aug;68(2 Pt 1):021913. doi: 10.1103/PhysRevE.68.021913. Epub 2003 Aug 22.

DOI:10.1103/PhysRevE.68.021913
PMID:14525012
Abstract

A measure representation of protein sequences similar to the measure representation of DNA sequences proposed in our previous paper [Yu et al., Phys. Rev. E 64, 031903 (2001)] and another induced measure are introduced. Multifractal analysis is then performed on these two kinds of measures of a large number of protein sequences derived from corresponding complete genomes. From the values of the D(q) (generalized dimensions) spectra and related C(q) (analogous specific heat) curves, it is concluded that these protein sequences are not completely random sequences. For substrings with length K=5, the D(q) spectra of all organisms studied are multifractal-like and sufficiently smooth for the C(q) curves to be meaningful. The C(q) curves of all bacteria resemble a classical phase transition at a critical point. But the "analogous" phase transitions of higher organisms studied exhibit the shape of double-peaked specific heat function. But for the classification problem, the multifractal property is not sufficient. When the measure representations of protein sequences from complete genomes are considered as time series, a method based on correlation analysis after removing some memory from the time series is proposed to construct a phylogenetic tree. This construction is shown to be reasonably satisfactory.

摘要

引入了一种类似于我们之前论文[Yu等人,《物理评论E》64,031903(2001)]中提出的DNA序列测度表示的蛋白质序列测度表示以及另一种诱导测度。然后对从相应完整基因组中获得的大量蛋白质序列的这两种测度进行多重分形分析。从D(q)(广义维数)谱的值和相关的C(q)(类似比热)曲线可以得出结论,这些蛋白质序列不是完全随机的序列。对于长度K = 5的子串,所有研究生物体的D(q)谱都类似多重分形,并且C(q)曲线足够平滑,有意义。所有细菌的C(q)曲线在临界点处类似于经典相变。但是所研究的高等生物体的“类似”相变呈现出双峰比热函数的形状。但是对于分类问题,多重分形性质是不够的。当将完整基因组中蛋白质序列的测度表示视为时间序列时,提出了一种在从时间序列中去除一些记忆后基于相关分析的方法来构建系统发育树。这种构建显示出相当令人满意的结果。

相似文献

1
Multifractal and correlation analyses of protein sequences from complete genomes.来自完整基因组的蛋白质序列的多重分形和相关性分析。
Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Aug;68(2 Pt 1):021913. doi: 10.1103/PhysRevE.68.021913. Epub 2003 Aug 22.
2
Measure representation and multifractal analysis of complete genomes.
Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Sep;64(3 Pt 1):031903. doi: 10.1103/PhysRevE.64.031903. Epub 2001 Aug 24.
3
Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses.基于详细HP模型的蛋白质序列混沌博弈表示及其多重分形与相关性分析
J Theor Biol. 2004 Feb 7;226(3):341-8. doi: 10.1016/j.jtbi.2003.09.009.
4
Investigation on series of length of coding and non-coding DNA sequences of bacteria using multifractal detrended cross-correlation analysis.利用多重分形去趋势交叉相关分析研究细菌的编码和非编码 DNA 序列长度系列。
J Theor Biol. 2013 Mar 21;321:54-62. doi: 10.1016/j.jtbi.2012.12.027. Epub 2013 Jan 10.
5
Multifractal statistics of the local order parameter at random critical points: application to wetting transitions with disorder.随机临界点处局部序参量的多重分形统计:在具有无序性的浸润转变中的应用。
Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Aug;76(2 Pt 1):021114. doi: 10.1103/PhysRevE.76.021114. Epub 2007 Aug 21.
6
Hierarchical multifractal representation of symbolic sequences and application to human chromosomes.符号序列的分层多重分形表示及其在人类染色体中的应用。
Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Feb;81(2 Pt 2):026102. doi: 10.1103/PhysRevE.81.026102. Epub 2010 Feb 8.
7
Relationships of exponents in two-dimensional multifractal detrended fluctuation analysis.二维多重分形去趋势波动分析中指数的关系。
Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012921. doi: 10.1103/PhysRevE.87.012921. Epub 2013 Jan 31.
8
A fractal method to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation.
J Theor Biol. 2005 Feb 21;232(4):559-67. doi: 10.1016/j.jtbi.2004.09.002.
9
Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis.基于相似多重分形分析的加州电力市场交叉相关检测与分析。
Chaos. 2013 Mar;23(1):013129. doi: 10.1063/1.4793355.
10
Hardware Accelerator for the Multifractal Analysis of DNA Sequences.用于 DNA 序列多重分形分析的硬件加速器。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1611-1624. doi: 10.1109/TCBB.2017.2731339. Epub 2017 Jul 24.

引用本文的文献

1
Enhancing Taxonomic Categorization of DNA Sequences with Deep Learning: A Multi-Label Approach.利用深度学习增强DNA序列的分类:一种多标签方法。
Bioengineering (Basel). 2023 Nov 8;10(11):1293. doi: 10.3390/bioengineering10111293.
2
Multifractality of Complex Networks Is Also Due to Geometry: A Geometric Sandbox Algorithm.复杂网络的多重分形性也源于几何:一种几何沙盒算法。
Entropy (Basel). 2023 Sep 11;25(9):1324. doi: 10.3390/e25091324.
3
Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method.基于位置加权k-mer方法的HIV-1基因组系统发育分析
Entropy (Basel). 2020 Feb 23;22(2):255. doi: 10.3390/e22020255.
4
minating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Repeats in the Development of Atherosclerotic Vascular Disease.终止动脉粥样硬化进展的途径:混沌理论提示在动脉粥样硬化血管疾病发生中重复序列的作用。
Int J Mol Sci. 2018 Jun 12;19(6):1734. doi: 10.3390/ijms19061734.
5
Multifractal analysis of weighted networks by a modified sandbox algorithm.基于改进沙盒算法的加权网络多重分形分析
Sci Rep. 2015 Dec 4;5:17628. doi: 10.1038/srep17628.
6
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans.秀丽隐杆线虫3号染色体的序列复杂性
Adv Bioinformatics. 2012;2012:287486. doi: 10.1155/2012/287486. Epub 2012 Jul 20.
7
The human genome: a multifractal analysis.人类基因组:多重分形分析。
BMC Genomics. 2011 Oct 14;12:506. doi: 10.1186/1471-2164-12-506.
8
Proper distance metrics for phylogenetic analysis using complete genomes without sequence alignment.使用完整基因组进行系统发育分析而无需序列比对的适当距离度量。
Int J Mol Sci. 2010 Mar 18;11(3):1141-54. doi: 10.3390/ijms11031141.
9
Phylogeny of prokaryotes and chloroplasts revealed by a simple composition approach on all protein sequences from complete genomes without sequence alignment.通过对来自完整基因组的所有蛋白质序列采用简单组成方法(无需序列比对)揭示的原核生物和叶绿体的系统发育。
J Mol Evol. 2005 Apr;60(4):538-45. doi: 10.1007/s00239-004-0255-9.
10
Protein sequences as a "literary" text.
Dokl Biochem Biophys. 2004 Jul-Aug;397:235-8. doi: 10.1023/b:dobi.0000039472.72939.80.