系统发育基因组学的进化稀疏学习。

Evolutionary Sparse Learning for Phylogenomics.

机构信息

Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.

Department of Biology, Temple University, Philadelphia, PA, USA.

出版信息

Mol Biol Evol. 2021 Oct 27;38(11):4674-4682. doi: 10.1093/molbev/msab227.

DOI:10.1093/molbev/msab227

PMID:34343318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8557465/

Abstract

We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci-such as genes, proteins, genomic segments, and positions-as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL's fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics.

摘要

我们引入了一种带有稀疏约束的监督机器学习方法用于系统发育基因组学，称为进化稀疏学习（Evolutionary Sparse Learning，ESL）。ESL 将基因组位置（如基因、蛋白质、基因组片段和位置）作为参数构建模型。使用最小绝对收缩和选择算子（Least Absolute Shrinkage and Selection Operator），ESL 仅选择最重要的基因组位置来解释给定的系统发育假设或特征的存在/不存在。ESL 模型不直接涉及核苷酸之间的替代率、位置间的速率变化和系统发育分支长度等常规参数。相反，ESL 直接利用比对中序列之间的一致性与感兴趣的进化假设。ESL 提供了一种自然的方法来组合不同的分子和非分子数据类型，并在模型构建中纳入基因组位置的生物学和功能注释。我们提出了位置、基因、功能和假设稀疏得分，并通过一个示例说明了它们的用法，并提出了 ESL 的几种应用。ESL 框架有可能推动一类新的计算方法的发展，这些方法将补充进化基因组学中的传统方法，特别是在给定系统发育并构建模型来检验假设时，用于识别有影响力的位置和序列。ESL 的快速计算时间和小内存占用也将有助于普及大数据分析并提高系统发育基因组学的科学严谨性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd7d/8557465/e7f48e518c54/msab227f1.jpg

相似文献

Evolutionary Sparse Learning for Phylogenomics.

Mol Biol Evol. 2021 Oct 27;38(11):4674-4682. doi: 10.1093/molbev/msab227.

Discovering Fragile Clades and Causal Sequences in Phylogenomics by Evolutionary Sparse Learning.

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae131.

Embracing Green Computing in Molecular Phylogenetics.

Mol Biol Evol. 2022 Mar 2;39(3). doi: 10.1093/molbev/msac043.

Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference.

Mol Biol Evol. 2020 Apr 1;37(4):1202-1210. doi: 10.1093/molbev/msz291.

Hierarchical Hybrid Enrichment: Multitiered Genomic Data Collection Across Evolutionary Scales, With Application to Chorus Frogs (Pseudacris).

Syst Biol. 2020 Jul 1;69(4):756-773. doi: 10.1093/sysbio/syz074.

Comparison of 19 Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics.

Front Cell Infect Microbiol. 2017 Feb 14;7:28. doi: 10.3389/fcimb.2017.00028. eCollection 2017.

ClockstaRX: Testing Molecular Clock Hypotheses With Genomic Data.

Genome Biol Evol. 2024 Apr 2;16(4). doi: 10.1093/gbe/evae064.

Evolutionary Rate Variation among Lineages in Gene Trees has a Negative Impact on Species-Tree Inference.

Syst Biol. 2022 Feb 10;71(2):490-500. doi: 10.1093/sysbio/syab051.

Primate phylogenomics: developing numerous nuclear non-coding, non-repetitive markers for ecological and phylogenetic applications and analysis of evolutionary rate variation.

BMC Genomics. 2009 May 26;10:247. doi: 10.1186/1471-2164-10-247.

Statistics and truth in phylogenomics.

Mol Biol Evol. 2012 Feb;29(2):457-72. doi: 10.1093/molbev/msr202. Epub 2011 Aug 26.

引用本文的文献

From Trees to Traits: A Review of Advances in PhyloG2P Methods and Future Directions.

Genome Biol Evol. 2025 Sep 2;17(9). doi: 10.1093/gbe/evaf150.

Enabling data-driven discoveries in evolutionary genetics and genomics.

Genetics. 2025 Jul 9;230(3). doi: 10.1093/genetics/iyaf084.

Evolutionary sparse learning reveals the shared genetic basis of convergent traits.

Nat Commun. 2025 Apr 4;16(1):3217. doi: 10.1038/s41467-025-58428-8.

MyESL: Sparse learning in molecular evolution and phylogenetic analysis.

ArXiv. 2025 Jan 9:arXiv:2501.04941v1.

Evolutionary sparse learning with paired species contrast reveals the shared genetic basis of convergent traits.

bioRxiv. 2025 Jan 8:2025.01.08.631987. doi: 10.1101/2025.01.08.631987.

MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing.

Mol Biol Evol. 2024 Dec 6;41(12). doi: 10.1093/molbev/msae263.

A machine-learning-based alternative to phylogenetic bootstrap.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i208-i217. doi: 10.1093/bioinformatics/btae255.

Discovering Fragile Clades and Causal Sequences in Phylogenomics by Evolutionary Sparse Learning.

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae131.

Constructing phylogenetic networks via cherry picking and machine learning.

Algorithms Mol Biol. 2023 Sep 16;18(1):13. doi: 10.1186/s13015-023-00233-3.

A LASSO-based approach to sample sites for phylogenetic tree search.

Bioinformatics. 2022 Jun 24;38(Suppl 1):i118-i124. doi: 10.1093/bioinformatics/btac252.

本文引用的文献

Harnessing machine learning to guide phylogenetic-tree search algorithms.

Nat Commun. 2021 Mar 31;12(1):1983. doi: 10.1038/s41467-021-22073-8.

The Gene Ontology resource: enriching a GOld mine.

Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.

Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.

Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies.

Mol Biol Evol. 2020 May 1;37(5):1495-1507. doi: 10.1093/molbev/msz307.

Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.

Syst Biol. 2020 Mar 1;69(2):221-233. doi: 10.1093/sysbio/syz060.

A Machine Learning Method for Detecting Autocorrelation of Evolutionary Rates in Large Phylogenies.

Mol Biol Evol. 2019 Apr 1;36(4):811-824. doi: 10.1093/molbev/msz014.

Analyzing Contentious Relationships and Outlier Genes in Phylogenomics.

Syst Biol. 2018 Sep 1;67(5):916-924. doi: 10.1093/sysbio/syy043.

Localization of adaptive variants in human genomes using averaged one-dependence estimation.

Nat Commun. 2018 Feb 19;9(1):703. doi: 10.1038/s41467-018-03100-7.

Contentious relationships in phylogenomic studies can be driven by a handful of genes.

Nat Ecol Evol. 2017 Apr 10;1(5):126. doi: 10.1038/s41559-017-0126.

PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM-LIKELIHOOD APPROACH.

Evolution. 1992 Feb;46(1):159-173. doi: 10.1111/j.1558-5646.1992.tb01991.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

系统发育基因组学的进化稀疏学习。

Evolutionary Sparse Learning for Phylogenomics.

机构信息

Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.

Department of Biology, Temple University, Philadelphia, PA, USA.

出版信息

Mol Biol Evol. 2021 Oct 27;38(11):4674-4682. doi: 10.1093/molbev/msab227.

DOI:10.1093/molbev/msab227

PMID:34343318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8557465/

Abstract

摘要

系统发育基因组学的进化稀疏学习。

Evolutionary Sparse Learning for Phylogenomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

系统发育基因组学的进化稀疏学习。

Evolutionary Sparse Learning for Phylogenomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献