BMGE（基于信息熵的块映射与聚集）：一种从多序列比对中选择系统发育信息区域的新软件。

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.

机构信息

Institut Pasteur, Unité de Biologie Moléculaire du Gène chez Extrêmophiles, Département de Microbiologie, 25 rue du Dr Roux, 75015 Paris, France.

出版信息

BMC Evol Biol. 2010 Jul 13;10:210. doi: 10.1186/1471-2148-10-210.

DOI:10.1186/1471-2148-10-210

PMID:20626897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3017758/

Abstract

BACKGROUND

The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step.

RESULTS

Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity.

CONCLUSIONS

BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.

摘要

背景

多序列比对的质量对系统发育推断的准确性起着重要作用。已经表明，去除模糊对齐区域，以及其他来源的偏差，如高度可变（饱和）的字符，可以提高许多系统发育重建方法的整体性能。目前的科学趋势是从大量的完整基因组中半自动提取大量序列数据集来构建系统发育树。由于这些方法不允许对每个数据集进行精确的手动编辑，因此需要高效的生物信息学工具来专门用于此对齐字符修剪步骤。

结果

这里介绍了一种新的软件，名为 BMGE（基于熵的块映射和聚集），它旨在选择多序列比对中适合系统发育推断的区域。对于每个字符，BMGE 计算一个与熵值密切相关的得分。这些类似熵得分的计算是用 BLOSUM 或 PAM 相似性矩阵加权的，以区分每个对齐字符的生物预期和意外可变性。得分超过给定阈值的连续字符集被认为不适合系统发育推断，然后被删除。模拟分析表明，BMGE 执行的字符修剪产生了导致准确树的数据集，特别是对于包含远距离相关序列的比对。BMGE 还实现了修剪和重新编码方法，旨在最小化由于组成异质性导致的系统发育重建伪影。

结论

BMGE 能够对 DNA、密码子或氨基酸序列的多序列比对进行生物学相关的修剪。Java 源代码和可执行文件可在 ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/ 免费获得。

相似文献

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.

BMC Evol Biol. 2010 Jul 13;10:210. doi: 10.1186/1471-2148-10-210.

trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Bioinformatics. 2009 Aug 1;25(15):1972-3. doi: 10.1093/bioinformatics/btp348. Epub 2009 Jun 8.

ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference.

PLoS Biol. 2020 Dec 2;18(12):e3001007. doi: 10.1371/journal.pbio.3001007. eCollection 2020 Dec.

Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination.

BMC Bioinformatics. 2011 Jan 7;12:10. doi: 10.1186/1471-2105-12-10.

Bayesian coestimation of phylogeny and sequence alignment.

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.

Syst Biol. 2007 Aug;56(4):564-77. doi: 10.1080/10635150701472164.

CSA: an efficient algorithm to improve circular DNA multiple alignment.

BMC Bioinformatics. 2009 Jul 23;10:230. doi: 10.1186/1471-2105-10-230.

引用本文的文献

Mobilome-mediated transcriptional activation of biosynthetic gene clusters and its impact on strain competitiveness in food fermentation microbiomes.

Microbiome. 2025 Aug 28;13(1):191. doi: 10.1186/s40168-025-02180-0.

Impacts of immune checkpoint inhibitors use on the HIV reservoir are linked to provirus sequences but not integration sites.

Sci Rep. 2025 Aug 28;15(1):31726. doi: 10.1038/s41598-025-15349-2.

An activator regulates the DNA damage response and anti-phage defense networks in Moraxellaceae.

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf828.

Protists with Uncertain Phylogenetic Affiliations for Resolving the Deep Tree of Eukaryotes.

Microorganisms. 2025 Aug 18;13(8):1926. doi: 10.3390/microorganisms13081926.

Bringing the uncultivated microbial majority of freshwater ecosystems into culture.

Nat Commun. 2025 Aug 26;16(1):7971. doi: 10.1038/s41467-025-63266-9.

Genetic determinants of pOXA-48 plasmid maintenance and propagation in Escherichia coli.

Nat Commun. 2025 Aug 19;16(1):7734. doi: 10.1038/s41467-025-62404-7.

Unraveling the Chloroplast Genome of Stellaria media: Comprehensive Analysis, Taxonomic Implications, and Evolutionary Perspectives.

Biochem Genet. 2025 Aug 19. doi: 10.1007/s10528-025-11229-6.

Phylogenomic Analyses Reveal that Panguiarchaeum Is a Clade of Genome-Reduced Asgard Archaea Within the Njordarchaeia.

Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf201.

Hidden Markov Model-Based Prokaryotic Genome Space Mining Reveals the Widespread Pervasiveness of Complex I and Its Potential Evolutionary Scheme.

Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf154.

Mass development of a filamentous and likely nitrophilous aerophytic green alga on tree bark: sp. nov. (Chlorophyta, Trebouxiophyceae).

Front Microbiol. 2025 Jul 23;16:1633308. doi: 10.3389/fmicb.2025.1633308. eCollection 2025.

本文引用的文献

The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis.

Cladistics. 1999 Dec;15(4):407-414. doi: 10.1111/j.1096-0031.1999.tb00277.x.

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.

Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.

The impact of multiple protein sequence alignment on phylogenetic estimation.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):1108-19. doi: 10.1109/TCBB.2009.68.

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Syst Biol. 2010 May;59(3):307-21. doi: 10.1093/sysbio/syq010. Epub 2010 Mar 29.

The statistical sign test.

J Am Stat Assoc. 1946 Dec;41(236):557-66. doi: 10.1080/01621459.1946.10501898.

Mobyle: a new full web bioinformatics framework.

Bioinformatics. 2009 Nov 15;25(22):3005-11. doi: 10.1093/bioinformatics/btp493. Epub 2009 Aug 17.

trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Bioinformatics. 2009 Aug 1;25(15):1972-3. doi: 10.1093/bioinformatics/btp348. Epub 2009 Jun 8.

Recovering evolutionary trees under a more realistic model of sequence evolution.

Mol Biol Evol. 1994 Jul;11(4):605-12. doi: 10.1093/oxfordjournals.molbev.a040136.

Estimation of phylogeny using a general Markov model.

Evol Bioinform Online. 2007 Feb 25;1:62-80.

Phylogenetic inference with weighted codon evolutionary distances.

J Mol Evol. 2009 Apr;68(4):377-92. doi: 10.1007/s00239-009-9212-y. Epub 2009 Mar 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

BMGE（基于信息熵的块映射与聚集）：一种从多序列比对中选择系统发育信息区域的新软件。

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.

机构信息

Institut Pasteur, Unité de Biologie Moléculaire du Gène chez Extrêmophiles, Département de Microbiologie, 25 rue du Dr Roux, 75015 Paris, France.

出版信息

BMC Evol Biol. 2010 Jul 13;10:210. doi: 10.1186/1471-2148-10-210.

DOI:10.1186/1471-2148-10-210

PMID:20626897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3017758/

Abstract

BACKGROUND

RESULTS

CONCLUSIONS

摘要

BMGE（基于信息熵的块映射与聚集）：一种从多序列比对中选择系统发育信息区域的新软件。

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

BMGE（基于信息熵的块映射与聚集）：一种从多序列比对中选择系统发育信息区域的新软件。

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献