鉴定多重序列比对中高置信同源簇。

Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.

机构信息

Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.

Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan.

出版信息

Mol Biol Evol. 2019 Oct 1;36(10):2340-2351. doi: 10.1093/molbev/msz142.

DOI:10.1093/molbev/msz142

PMID:31209473

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6933875/

Abstract

Multiple sequence alignment (MSA) is ubiquitous in evolution and bioinformatics. MSAs are usually taken to be a known and fixed quantity on which to perform downstream analysis despite extensive evidence that MSA accuracy and uncertainty affect results. These errors are known to cause a wide range of problems for downstream evolutionary inference, ranging from false inference of positive selection to long branch attraction artifacts. The most popular approach to dealing with this problem is to remove (filter) specific columns in the MSA that are thought to be prone to error. Although popular, this approach has had mixed success and several studies have even suggested that filtering might be detrimental to phylogenetic studies. We present a graph-based clustering method to address MSA uncertainty and error in the software Divvier (available at https://github.com/simonwhelan/Divvier), which uses a probabilistic model to identify clusters of characters that have strong statistical evidence of shared homology. These clusters can then be used to either filter characters from the MSA (partial filtering) or represent each of the clusters in a new column (divvying). We validate Divvier through its performance on real and simulated benchmarks, finding Divvier substantially outperforms existing filtering software by retaining more true pairwise homologies calls and removing more false positive pairwise homologies. We also find that Divvier, in contrast to other filtering tools, can alleviate long branch attraction artifacts induced by MSA and reduces the variation in tree estimates caused by MSA uncertainty.

摘要

多序列比对 (MSA) 在进化和生物信息学中无处不在。尽管有大量证据表明 MSA 的准确性和不确定性会影响结果，但通常认为 MSA 是一个已知且固定的数量，可以在此基础上进行下游分析。这些错误已知会导致下游进化推断出现广泛的问题，从错误推断正选择到长枝吸引artifact。处理这个问题最流行的方法是删除（过滤）MSA 中被认为容易出错的特定列。尽管这种方法很流行，但它的效果参差不齐，一些研究甚至表明过滤可能对系统发育研究有害。我们提出了一种基于图的聚类方法来解决 MSA 中的不确定性和误差，该方法在软件 Divvier（可在 https://github.com/simonwhelan/Divvier 上获得）中使用概率模型来识别具有强烈同源共享统计证据的字符聚类。然后，可以使用这些聚类从 MSA 中过滤字符（部分过滤）或在新列中表示每个聚类（分割）。我们通过其在真实和模拟基准上的性能验证了 Divvier，发现 Divvier 通过保留更多真实的两两同源调用并去除更多假阳性的两两同源调用，大大优于现有的过滤软件。我们还发现，与其他过滤工具相比，Divvier 可以减轻 MSA 引起的长枝吸引artifact，并减少由 MSA 不确定性引起的树估计值的变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be17/6933875/4361b11d615f/msz142f1.jpg

相似文献

Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.

Mol Biol Evol. 2019 Oct 1;36(10):2340-2351. doi: 10.1093/molbev/msz142.

Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.

Genome Biol Evol. 2015 Jul 1;7(8):2102-16. doi: 10.1093/gbe/evv127.

TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

Mol Biol Evol. 2014 Jun;31(6):1625-37. doi: 10.1093/molbev/msu117. Epub 2014 Apr 1.

Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences.

BMC Evol Biol. 2019 Jan 11;19(1):21. doi: 10.1186/s12862-019-1350-2.

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs.

BMC Bioinformatics. 2015 Apr 1;16:108. doi: 10.1186/s12859-015-0516-1.

Characterization of pairwise and multiple sequence alignment errors.

Gene. 2009 Jul 15;441(1-2):141-7. doi: 10.1016/j.gene.2008.05.016. Epub 2008 Jun 3.

Class of multiple sequence alignment algorithm affects genomic analysis.

Mol Biol Evol. 2013 Mar;30(3):642-53. doi: 10.1093/molbev/mss256. Epub 2012 Nov 9.

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Syst Biol. 2015 Sep;64(5):778-91. doi: 10.1093/sysbio/syv033. Epub 2015 Jun 1.

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.

Nucleic Acids Res. 2015 Jul 1;43(W1):W7-14. doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.

PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

Mol Phylogenet Evol. 2016 Sep;102:331-43. doi: 10.1016/j.ympev.2016.07.001. Epub 2016 Jul 1.

引用本文的文献

Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.

A taxon-rich and genome-scale phylogeny of Opisthokonta.

PLoS Biol. 2024 Sep 16;22(9):e3002794. doi: 10.1371/journal.pbio.3002794. eCollection 2024 Sep.

Evolutionary and functional insights into the Ski2-like helicase family in Archaea: a comparison of Thermococcales ASH-Ski2 and Hel308 activities.

NAR Genom Bioinform. 2024 Mar 18;6(1):lqae026. doi: 10.1093/nargab/lqae026. eCollection 2024 Mar.

A histone demethylase links the loss of plasticity to nongenetic inheritance and morphological change.

Nat Commun. 2023 Dec 19;14(1):8439. doi: 10.1038/s41467-023-44306-8.

Single-cell genomics reveals new rozellid lineages and supports their sister relationship to Microsporidia.

Biol Lett. 2023 Dec;19(12):20230398. doi: 10.1098/rsbl.2023.0398. Epub 2023 Dec 13.

Phosphate Limitation Responses in Marine Green Algae Are Linked to Reprogramming of the tRNA Epitranscriptome and Codon Usage Bias.

Mol Biol Evol. 2023 Dec 1;40(12). doi: 10.1093/molbev/msad251.

Multiple parallel origins of parasitic Marine Alveolates.

Nat Commun. 2023 Nov 3;14(1):7049. doi: 10.1038/s41467-023-42807-0.

PhylteR: Efficient Identification of Outlier Sequences in Phylogenomic Datasets.

Mol Biol Evol. 2023 Nov 3;40(11). doi: 10.1093/molbev/msad234.

Genomic Signatures Associated with Transitions to Viviparity in Cyprinodontiformes.

Mol Biol Evol. 2023 Oct 4;40(10). doi: 10.1093/molbev/msad208.

The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus.

Mar Biotechnol (NY). 2023 Dec;25(6):907-917. doi: 10.1007/s10126-023-10248-x. Epub 2023 Sep 4.

本文引用的文献

Inferring Trees.

Methods Mol Biol. 2017;1525:349-377. doi: 10.1007/978-1-4939-6622-6_14.

Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.

Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.

Multiple sequence alignment modeling: methods and applications.

Brief Bioinform. 2016 Nov;17(6):1009-1023. doi: 10.1093/bib/bbv099. Epub 2015 Nov 27.

Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.

Genome Biol Evol. 2015 Jul 1;7(8):2102-16. doi: 10.1093/gbe/evv127.

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Syst Biol. 2015 Sep;64(5):778-91. doi: 10.1093/sysbio/syv033. Epub 2015 Jun 1.

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.

Nucleic Acids Res. 2015 Jul 1;43(W1):W7-14. doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.

Whole-genome analyses resolve early branches in the tree of life of modern birds.

Science. 2014 Dec 12;346(6215):1320-31. doi: 10.1126/science.1253451.

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.

PSAR-align: improving multiple sequence alignment using probabilistic sampling.

Bioinformatics. 2014 Apr 1;30(7):1010-2. doi: 10.1093/bioinformatics/btt636. Epub 2013 Nov 12.

MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

鉴定多重序列比对中高置信同源簇。

Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.

机构信息

Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.

Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan.

出版信息

Mol Biol Evol. 2019 Oct 1;36(10):2340-2351. doi: 10.1093/molbev/msz142.

DOI:10.1093/molbev/msz142

PMID:31209473

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6933875/

Abstract

摘要

鉴定多重序列比对中高置信同源簇。

Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

鉴定多重序列比对中高置信同源簇。

Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献