• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ProbPFP:一种通过粒子群优化算法优化的隐马尔可夫模型与分区函数相结合的多序列比对算法。

ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.

Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China.

出版信息

BMC Bioinformatics. 2019 Nov 25;20(Suppl 18):573. doi: 10.1186/s12859-019-3132-7.

DOI:10.1186/s12859-019-3132-7
PMID:31760933
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6876095/
Abstract

BACKGROUND

During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches.

RESULTS

A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods.

CONCLUSIONS

We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.

摘要

背景

在进行多序列比对的过程中,使用序列比对的替换得分是非常重要的。为了计算自适应比对得分,研究人员通常使用隐马尔可夫模型或概率一致性方法,如分区函数。最近的研究表明,优化隐马尔可夫模型的参数,以及将隐马尔可夫模型与分区函数相结合,可以提高比对的准确性。然而,这些研究忽略了将分区函数与优化的隐马尔可夫模型相结合,以进一步提高比对的准确性。

结果

本文提出了一种新的多序列比对算法 ProbPFP。它通过粒子群算法优化的隐马尔可夫模型与分区函数相结合。应用粒子群算法优化 HMM 的参数。然后,通过隐马尔可夫模型获得的后验概率与分区函数获得的后验概率相结合,从而计算出用于比对的综合替换得分。为了评估 ProbPFP 的有效性,我们将其与 13 种优秀或经典的 MSA 方法进行了比较。结果表明,在 SABmark 和 OXBench 这两个基准数据集上,ProbPFP 得到的比对结果具有最大的平均 TC 得分和平均 SP 得分,在 BAliBASE 基准数据集上,它的平均 TC 得分和平均 SP 得分排名第二。我们还将 ProbPFP 与其他 4 种优秀的方法进行了比较,通过基于这 5 种方法得到的比对结果,对从数据库 TreeFam 中提取的六个蛋白质家族的系统发育树进行重建。结果表明,参考树与基于 ProbPFP 得到的比对结果重建的系统发育树更为接近。

结论

我们在本文中提出了一种新的多序列比对方法,将优化的隐马尔可夫模型与分区函数相结合。该方法的性能验证了其可以显著提高比对的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20fb/6876095/1705fa918628/12859_2019_3132_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20fb/6876095/9b0c1003f634/12859_2019_3132_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20fb/6876095/1705fa918628/12859_2019_3132_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20fb/6876095/9b0c1003f634/12859_2019_3132_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20fb/6876095/1705fa918628/12859_2019_3132_Fig2_HTML.jpg

相似文献

1
ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function.ProbPFP:一种通过粒子群优化算法优化的隐马尔可夫模型与分区函数相结合的多序列比对算法。
BMC Bioinformatics. 2019 Nov 25;20(Suppl 18):573. doi: 10.1186/s12859-019-3132-7.
2
Multiple Sequence Alignment with Hidden Markov Models Learned by Random Drift Particle Swarm Optimization.基于随机漂移粒子群优化算法学习的隐马尔可夫模型的多序列比对
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):243-57. doi: 10.1109/TCBB.2013.148.
3
MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.MSAProbs:基于对隐马尔可夫模型和分区函数后验概率的多重序列比对。
Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.
4
Multiple protein sequence alignment with MSAProbs.使用MSAProbs进行多蛋白序列比对。
Methods Mol Biol. 2014;1079:211-8. doi: 10.1007/978-1-62703-646-7_14.
5
Improved Hidden Markov Model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid.通过粒子群优化-进化算法混合改进用于多序列比对的隐马尔可夫模型训练
Biosystems. 2003 Nov;72(1-2):5-17. doi: 10.1016/s0303-2647(03)00131-x.
6
Fast multiple sequence alignment via multi-armed bandits.基于多臂老虎机的快速多重序列比对。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i328-i336. doi: 10.1093/bioinformatics/btae225.
7
An expectation maximization algorithm for training hidden substitution models.一种用于训练隐式替换模型的期望最大化算法。
J Mol Biol. 2002 Apr 12;317(5):753-64. doi: 10.1006/jmbi.2002.5405.
8
Enhancing the quality of phylogenetic analysis using fuzzy hidden Markov model alignments.使用模糊隐马尔可夫模型比对提高系统发育分析的质量。
Stud Health Technol Inform. 2007;129(Pt 2):1245-9.
9
MRFalign: protein homology detection through alignment of Markov random fields.MRFalign:通过马尔可夫随机场比对进行蛋白质同源性检测。
PLoS Comput Biol. 2014 Mar 27;10(3):e1003500. doi: 10.1371/journal.pcbi.1003500. eCollection 2014 Mar.
10
Multiple sequence alignment using Probcons and Probalign.使用Probcons和Probalign进行多序列比对。
Methods Mol Biol. 2014;1079:147-53. doi: 10.1007/978-1-62703-646-7_9.

引用本文的文献

1
Large scale sequence alignment via efficient inference in generative models.通过生成模型中的有效推断进行大规模序列比对。
Sci Rep. 2023 May 4;13(1):7285. doi: 10.1038/s41598-023-34257-x.
2
SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array.SaAlign:基于后缀数组的用于超大型数据集和超长序列的多DNA/RNA序列比对及系统发育树构建工具。
Comput Struct Biotechnol J. 2022 Mar 21;20:1487-1493. doi: 10.1016/j.csbj.2022.03.018. eCollection 2022.
3
A particle swarm optimization improved BP neural network intelligent model for electrocardiogram classification.

本文引用的文献

1
Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data.将基因本体论与深度神经网络相结合,以增强单细胞 RNA-Seq 数据的聚类。
BMC Bioinformatics. 2019 Jun 10;20(Suppl 8):284. doi: 10.1186/s12859-019-2769-6.
2
Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder.基于Node2vec和自动编码器预测帕金森病基因
Front Genet. 2019 Apr 2;10:226. doi: 10.3389/fgene.2019.00226. eCollection 2019.
3
Identification of Alzheimer's Disease-Related Genes Based on Data Integration Method.
粒子群优化改进的 BP 神经网络智能模型在心电图分类中的应用。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):99. doi: 10.1186/s12911-021-01453-6.
4
Efficient Multiple Sequences Alignment Algorithm Generation Components Assembly Under PAR Framework.并行架构下高效多序列比对算法生成组件装配
Front Genet. 2021 Feb 4;11:628175. doi: 10.3389/fgene.2020.628175. eCollection 2020.
5
Research on Components Assembly Platform of Biological Sequences Alignment Algorithm.生物序列比对算法组件组装平台研究
Front Genet. 2021 Jan 21;11:630923. doi: 10.3389/fgene.2020.630923. eCollection 2020.
基于数据整合方法的阿尔茨海默病相关基因鉴定
Front Genet. 2019 Jan 25;9:703. doi: 10.3389/fgene.2018.00703. eCollection 2018.
4
Exposing the Causal Effect of C-Reactive Protein on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study.揭示C反应蛋白对2型糖尿病风险的因果效应:一项孟德尔随机化研究
Front Genet. 2018 Dec 20;9:657. doi: 10.3389/fgene.2018.00657. eCollection 2018.
5
LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse.LncRNA2Target v2.0:一个综合性的数据库,包含人类和小鼠中 lncRNA 的靶基因。
Nucleic Acids Res. 2019 Jan 8;47(D1):D140-D144. doi: 10.1093/nar/gky1051.
6
Human Disease System Biology.人类疾病系统生物学
Curr Gene Ther. 2018;18(5):255-256. doi: 10.2174/1566523218666181010101114.
7
Bright room temperature single photon source at telecom range in cubic silicon carbide.立方碳化硅中的电信波段明亮室温单光子源。
Nat Commun. 2018 Oct 5;9(1):4106. doi: 10.1038/s41467-018-06605-3.
8
Measuring phenotype-phenotype similarity through the interactome.通过互作组来测量表型-表型相似性。
BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):114. doi: 10.1186/s12859-018-2102-9.
9
Identifying diseases-related metabolites using random walk.利用随机游走识别与疾病相关的代谢物。
BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):116. doi: 10.1186/s12859-018-2098-1.
10
Defect evolution in ZnO and its effect on radiation tolerance.氧化锌中的缺陷演变及其对辐射耐受性的影响。
Phys Chem Chem Phys. 2018 May 3;20(17):11882-11887. doi: 10.1039/c8cp01855c.