• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

快速统计对齐

Fast statistical alignment.

作者信息

Bradley Robert K, Roberts Adam, Smoot Michael, Juvekar Sudeep, Do Jaeyoung, Dewey Colin, Holmes Ian, Pachter Lior

机构信息

Department of Mathematics, University of California Berkeley, Berkeley, California, United States of America.

出版信息

PLoS Comput Biol. 2009 May;5(5):e1000392. doi: 10.1371/journal.pcbi.1000392. Epub 2009 May 29.

DOI:10.1371/journal.pcbi.1000392
PMID:19478997
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2684580/
Abstract

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

摘要

我们描述了一种用于多生物序列比对的新程序,该程序既具有统计学依据,又对于实际中出现的问题规模而言足够快速。我们的快速统计比对程序基于成对隐马尔可夫模型,该模型近似于树上的插入/删除过程,并使用序列退火算法将从这些模型估计的后验概率组合成一个多序列比对。FSA 使用其明确的统计模型来生成多序列比对,同时为比对的每一列和每个字符提供比对准确性和不确定性的估计——此前只有使用计算成本高昂的马尔可夫链蒙特卡罗方法的比对程序才能做到这一点——而且能够比对数千条长序列。此外,FSA 利用一种无监督的特定查询学习程序进行参数估计,与现有程序相比,这使得在基准参考比对上的准确性得到提高。FSA 采用的质心比对方法及其学习程序,与其他方法相比,极大地减少了生物数据上的假阳性比对数量。FSA 程序以及一个用于探索比对不确定性的配套可视化工具可通过网页界面 http://orangutan.math.berkeley.edu/fsa/ 使用,其源代码可在 http://fsa.sourceforge.net/ 获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/95dcece3fd63/pcbi.1000392.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/761782504c77/pcbi.1000392.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/d375efe178ad/pcbi.1000392.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/4d5766a3ca14/pcbi.1000392.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/2bc5798cc676/pcbi.1000392.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/95dcece3fd63/pcbi.1000392.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/761782504c77/pcbi.1000392.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/d375efe178ad/pcbi.1000392.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/4d5766a3ca14/pcbi.1000392.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/2bc5798cc676/pcbi.1000392.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ef6/2684580/95dcece3fd63/pcbi.1000392.g005.jpg

相似文献

1
Fast statistical alignment.快速统计对齐
PLoS Comput Biol. 2009 May;5(5):e1000392. doi: 10.1371/journal.pcbi.1000392. Epub 2009 May 29.
2
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
Statistical alignment based on fragment insertion and deletion models.基于片段插入和缺失模型的统计比对。
Bioinformatics. 2003 Mar 1;19(4):490-9. doi: 10.1093/bioinformatics/btg026.
4
MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.MSAProbs:基于对隐马尔可夫模型和分区函数后验概率的多重序列比对。
Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.
5
COACH: profile-profile alignment of protein families using hidden Markov models.COACH:使用隐马尔可夫模型对蛋白质家族进行轮廓-轮廓比对。
Bioinformatics. 2004 May 22;20(8):1309-18. doi: 10.1093/bioinformatics/bth091. Epub 2004 Feb 12.
6
Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.通过隐马尔可夫模型的蒙特卡罗优化实现蛋白质序列基序的间隙比对。
BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.
7
Joint Bayesian estimation of alignment and phylogeny.比对与系统发育的联合贝叶斯估计。
Syst Biol. 2005 Jun;54(3):401-18. doi: 10.1080/10635150590947041.
8
HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率,使用轮廓隐马尔可夫模型改进分类。
BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.
9
SATCHMO: sequence alignment and tree construction using hidden Markov models.SATCHMO:使用隐马尔可夫模型进行序列比对和树构建。
Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.
10
Pair hidden Markov models on tree structures.树结构上的成对隐马尔可夫模型。
Bioinformatics. 2003;19 Suppl 1:i232-40. doi: 10.1093/bioinformatics/btg1032.

引用本文的文献

1
Metagenome-assembled genomes reveal novel diversity and atypical sources of a superbug.宏基因组组装基因组揭示了一种超级细菌的新多样性和非典型来源。
Microbiol Spectr. 2025 Mar 18;13(5):e0010625. doi: 10.1128/spectrum.00106-25.
2
The Genomic Landscape, Causes, and Consequences of Extensive Phylogenomic Discordance in Murine Rodents.小鼠啮齿动物中广泛的系统发育基因组不一致的基因组格局、原因及后果
Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf017.
3
Phylogenomics supports a single origin of terrestriality in isopods.系统发生基因组学支持等足目动物在陆地上的单一起源。

本文引用的文献

1
Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs.Enredo和Pecan:基于全基因组哺乳动物一致性的旁系同源物多序列比对
Genome Res. 2008 Nov;18(11):1814-28. doi: 10.1101/gr.076554.108. Epub 2008 Oct 10.
2
Tools for simulating evolution of aligned genomic regions with integrated parameter estimation.用于模拟对齐基因组区域进化的工具,具有集成参数估计功能。
Genome Biol. 2008 Oct 8;9(10):R147. doi: 10.1186/gb-2008-9-10-r147.
3
Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960.1960年时,金沙萨存在广泛多样的HIV-1的直接证据。
Proc Biol Sci. 2024 Oct;291(2033):20241042. doi: 10.1098/rspb.2024.1042. Epub 2024 Oct 30.
4
Total substitution and partial modification of the set of non-ribosomal peptide synthetases clusters lead to pyoverdine diversity in the complex.非核糖体肽合成酶簇集的完全替换和部分修饰导致了该复合物中铁载体的多样性。
Front Microbiol. 2024 Aug 19;15:1421749. doi: 10.3389/fmicb.2024.1421749. eCollection 2024.
5
Genome and life-history evolution link bird diversification to the end-Cretaceous mass extinction.基因组和生活史演化将鸟类多样性与白垩纪末大灭绝联系起来。
Sci Adv. 2024 Aug 2;10(31):eadp0114. doi: 10.1126/sciadv.adp0114. Epub 2024 Jul 31.
6
Adaptations to nitrogen availability drive ecological divergence of chemosynthetic symbionts.对氮可用性的适应导致了化学合成共生体的生态分歧。
PLoS Genet. 2024 May 31;20(5):e1011295. doi: 10.1371/journal.pgen.1011295. eCollection 2024 May.
7
The genome of Litomosoides sigmodontis illuminates the origins of Y chromosomes in filarial nematodes.棉鼠丝虫的基因组揭示了丝虫线虫Y染色体的起源。
PLoS Genet. 2024 Jan 16;20(1):e1011116. doi: 10.1371/journal.pgen.1011116. eCollection 2024 Jan.
8
Ancient diversity in host-parasite interaction genes in a model parasitic nematode.模式寄生线虫中宿主-寄生虫相互作用基因的古老多样性。
Nat Commun. 2023 Nov 27;14(1):7776. doi: 10.1038/s41467-023-43556-w.
9
Biochemical, functional and genomic characterization of a new probiotic Ligilactobacillus salivarius F14 from the gut of tribes of Odisha.来自奥里萨邦部落人群肠道的新型益生菌唾液利基乳杆菌F14的生化、功能及基因组特征分析
World J Microbiol Biotechnol. 2023 Apr 27;39(7):171. doi: 10.1007/s11274-023-03626-z.
10
Convergent and complementary selection shaped gains and losses of eusociality in sweat bees.趋同和互补选择塑造了社会性蜜蜂中群居性的得失。
Nat Ecol Evol. 2023 Apr;7(4):557-569. doi: 10.1038/s41559-023-02001-3. Epub 2023 Mar 20.
Nature. 2008 Oct 2;455(7213):661-4. doi: 10.1038/nature07390.
4
Specific alignment of structured RNA: stochastic grammars and sequence annealing.结构化RNA的特定比对:随机语法与序列退火
Bioinformatics. 2008 Dec 1;24(23):2677-83. doi: 10.1093/bioinformatics/btn495. Epub 2008 Sep 16.
5
Probabilistic phylogenetic inference with insertions and deletions.带插入和缺失的概率系统发育推断
PLoS Comput Biol. 2008 Sep 19;4(9):e1000172. doi: 10.1371/journal.pcbi.1000172.
6
StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees.StatAlign:一个用于比对和进化树联合贝叶斯估计的可扩展软件包。
Bioinformatics. 2008 Oct 15;24(20):2403-4. doi: 10.1093/bioinformatics/btn457. Epub 2008 Aug 27.
7
Segment-based multiple sequence alignment.基于片段的多序列比对。
Bioinformatics. 2008 Aug 15;24(16):i187-92. doi: 10.1093/bioinformatics/btn281.
8
Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.系统发育感知缺口放置可防止序列比对和进化分析中的错误。
Science. 2008 Jun 20;320(5883):1632-5. doi: 10.1126/science.1158395.
9
DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.DIALIGN-TX:基于片段的多序列比对的贪心与渐进方法。
Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.
10
Recent developments in the MAFFT multiple sequence alignment program.MAFFT多序列比对程序的最新进展。
Brief Bioinform. 2008 Jul;9(4):286-98. doi: 10.1093/bib/bbn013. Epub 2008 Mar 27.