• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FFP:氨基酸特性感知系统发育分析中的联合快速傅里叶变换和分形维数。

FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis.

机构信息

School of Computer, Electronics and Information, Guangxi University, Nanning, China.

Guangxi Normal University for Nationalities, Chongzuo, China.

出版信息

BMC Bioinformatics. 2022 Aug 19;23(1):347. doi: 10.1186/s12859-022-04889-3.

DOI:10.1186/s12859-022-04889-3
PMID:35986255
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9392226/
Abstract

BACKGROUND

Amino acid property-aware phylogenetic analysis (APPA) refers to the phylogenetic analysis method based on amino acid property encoding, which is used for understanding and inferring evolutionary relationships between species from the molecular perspective. Fast Fourier transform (FFT) and Higuchi's fractal dimension (HFD) have excellent performance in describing sequences' structural and complexity information for APPA. However, with the exponential growth of protein sequence data, it is very important to develop a reliable APPA method for protein sequence analysis.

RESULTS

Consequently, we propose a new method named FFP, it joints FFT and HFD. Firstly, FFP is used to encode protein sequences on the basis of the important physicochemical properties of amino acids, the dissociation constant, which determines acidity and basicity of protein molecules. Secondly, FFT and HFD are used to generate the feature vectors of encoded sequences, whereafter, the distance matrix is calculated from the cosine function, which describes the degree of similarity between species. The smaller the distance between them, the more similar they are. Finally, the phylogenetic tree is constructed. When FFP is tested for phylogenetic analysis on four groups of protein sequences, the results are obviously better than other comparisons, with the highest accuracy up to more than 97%.

CONCLUSION

FFP has higher accuracy in APPA and multi-sequence alignment. It also can measure the protein sequence similarity effectively. And it is hoped to play a role in APPA's related research.

摘要

背景

基于氨基酸性质的系统发育分析(APPA)是指基于氨基酸性质编码的系统发育分析方法,用于从分子角度理解和推断物种之间的进化关系。快速傅里叶变换(FFT)和 Higuchi 的分形维数(HFD)在描述序列的结构和复杂性信息方面具有出色的性能,适用于 APPA。然而,随着蛋白质序列数据的指数级增长,开发一种可靠的蛋白质序列分析 APPA 方法非常重要。

结果

因此,我们提出了一种名为 FFP 的新方法,它结合了 FFT 和 HFD。首先,FFP 基于氨基酸的重要物理化学性质——离解常数来对蛋白质序列进行编码,该常数决定了蛋白质分子的酸碱性。其次,FFT 和 HFD 用于生成编码序列的特征向量,然后,从余弦函数计算距离矩阵,该函数描述了物种之间的相似程度。它们之间的距离越小,相似度越高。最后,构建系统发育树。当 FFP 用于对四组蛋白质序列进行系统发育分析时,结果明显优于其他比较,准确率高达 97%以上。

结论

FFP 在 APPA 和多序列比对方面具有更高的准确性,能够有效测量蛋白质序列的相似性。希望它能在 APPA 的相关研究中发挥作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/b40c09dcc390/12859_2022_4889_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/6ab821b62119/12859_2022_4889_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/b88be82582d8/12859_2022_4889_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/74d261447ba2/12859_2022_4889_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/a014f069a087/12859_2022_4889_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/51d295510ace/12859_2022_4889_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/32e05d843f0a/12859_2022_4889_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/5ce54ef8f107/12859_2022_4889_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/127453a6e96e/12859_2022_4889_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/3806db8ec8cb/12859_2022_4889_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/8e5213001b20/12859_2022_4889_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/b40c09dcc390/12859_2022_4889_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/6ab821b62119/12859_2022_4889_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/b88be82582d8/12859_2022_4889_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/74d261447ba2/12859_2022_4889_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/a014f069a087/12859_2022_4889_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/51d295510ace/12859_2022_4889_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/32e05d843f0a/12859_2022_4889_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/5ce54ef8f107/12859_2022_4889_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/127453a6e96e/12859_2022_4889_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/3806db8ec8cb/12859_2022_4889_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/8e5213001b20/12859_2022_4889_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a6/9392226/b40c09dcc390/12859_2022_4889_Fig11_HTML.jpg

相似文献

1
FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis.FFP:氨基酸特性感知系统发育分析中的联合快速傅里叶变换和分形维数。
BMC Bioinformatics. 2022 Aug 19;23(1):347. doi: 10.1186/s12859-022-04889-3.
2
Modeling the relationship between Higuchi's fractal dimension and Fourier spectra of physiological signals.建立 Higuchi 分形维数与生理信号傅里叶谱之间的关系模型。
Med Biol Eng Comput. 2012 Jul;50(7):689-99. doi: 10.1007/s11517-012-0913-9. Epub 2012 May 17.
3
A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis.一种基于分形维数和小波变换的蛋白质序列相似性分析方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):348-59. doi: 10.1109/TCBB.2014.2363480.
4
Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis.使用针对氨基酸的密码子的数值表示将序列映射到特征向量,用于无比对序列分析。
Gene. 2021 Jan 15;766:145096. doi: 10.1016/j.gene.2020.145096. Epub 2020 Sep 9.
5
An alignment-free method to find similarity among protein sequences via the general form of Chou's pseudo amino acid composition.一种通过 Chou 的通用伪氨基酸组成形式来寻找蛋白质序列之间相似性的无对齐方法。
SAR QSAR Environ Res. 2013;24(7):597-609. doi: 10.1080/1062936X.2013.773378. Epub 2013 May 28.
6
An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。
J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.
7
Graphical Representation and Similarity Analysis of Protein Sequences Based on Fractal Interpolation.基于分形插值的蛋白质序列图形表示与相似性分析
IEEE/ACM Trans Comput Biol Bioinform. 2017 Jan-Feb;14(1):182-192. doi: 10.1109/TCBB.2015.2511731. Epub 2015 Dec 29.
8
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
9
Application of Higuchi's fractal dimension from basic to clinical neurophysiology: A review.从基础到临床神经生理学的Higuchi分形维数应用:综述
Comput Methods Programs Biomed. 2016 Sep;133:55-70. doi: 10.1016/j.cmpb.2016.05.014. Epub 2016 May 30.
10
Construction of protein distance matrix based on amino acid indices and Discrete Fourier Transform.基于氨基酸指数和离散傅里叶变换构建蛋白质距离矩阵。
Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:4066-9. doi: 10.1109/EMBC.2013.6610438.

引用本文的文献

1
CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model.CGRWDL:基于动态语言模型加权混沌博弈表示的病毒无比对系统发育重建方法
Front Microbiol. 2024 Mar 20;15:1339156. doi: 10.3389/fmicb.2024.1339156. eCollection 2024.

本文引用的文献

1
FEGS: a novel feature extraction model for protein sequences and its applications.FEGS:一种用于蛋白质序列的新型特征提取模型及其应用。
BMC Bioinformatics. 2021 Jun 3;22(1):297. doi: 10.1186/s12859-021-04223-3.
2
Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis.使用针对氨基酸的密码子的数值表示将序列映射到特征向量,用于无比对序列分析。
Gene. 2021 Jan 15;766:145096. doi: 10.1016/j.gene.2020.145096. Epub 2020 Sep 9.
3
Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction.
序列填充对深度学习模型在古菌蛋白功能预测中的性能的影响。
Sci Rep. 2020 Sep 3;10(1):14634. doi: 10.1038/s41598-020-71450-8.
4
LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data.LSTrAP-Crowd:通过对 RNA 测序数据的众包分析预测细菌核糖体的新成分。
BMC Biol. 2020 Sep 3;18(1):114. doi: 10.1186/s12915-020-00846-9.
5
Phylogenetic tree building in the genomic age.基因组时代的系统发育树构建。
Nat Rev Genet. 2020 Jul;21(7):428-444. doi: 10.1038/s41576-020-0233-0. Epub 2020 May 18.
6
Alignment-free genomic sequence comparison using FCGR and signal processing.基于 FCGR 和信号处理的无比对基因组序列比较。
BMC Bioinformatics. 2019 Dec 30;20(1):742. doi: 10.1186/s12859-019-3330-3.
7
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.基于累积傅里叶功率和相位谱的大规模基因组比较:中心矩和协方差向量
Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019.
8
DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information.基于 CGR 利用重塑多种信息对蛋白质序列进行特征提取
BMC Bioinformatics. 2019 Jun 20;20(1):351. doi: 10.1186/s12859-019-2943-x.
9
Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.氨基酸编码方法在蛋白质序列中的应用:全面综述与评估。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1918-1931. doi: 10.1109/TCBB.2019.2911677. Epub 2020 Dec 8.
10
A novel numerical model for protein sequences analysis based on spherical coordinates and multiple physicochemical properties of amino acids.基于球坐标和氨基酸多种理化性质的蛋白质序列分析新数值模型。
Biopolymers. 2019 Aug;110(8):e23282. doi: 10.1002/bip.23282. Epub 2019 Apr 12.