• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于四重奏的深度学习基因树推断在存在缺失数据的情况下仍能改进系统发育基因组分析。

Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data.

作者信息

Mahbub Sazan, Sawmya Shashata, Saha Arpita, Reaz Rezwana, Rahman M Sohel, Bayzid Md Shamsuzzoha

机构信息

Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.

Department of Computer Science, University of Maryland, College Park, Maryland, USA.

出版信息

J Comput Biol. 2022 Nov;29(11):1156-1172. doi: 10.1089/cmb.2022.0212. Epub 2022 Sep 1.

DOI:10.1089/cmb.2022.0212
PMID:36048555
Abstract

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present Quartet based Gene tree Imputation using Deep Learning (QT-GILD), an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing, which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical datasets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data.

摘要

物种树估计通常基于系统发育基因组学方法,该方法使用来自整个基因组的多个基因。然而,由于多种原因(从抽样偏差到更多生物学原因,如基因的产生和丢失),基因树往往是不完整的,这意味着并非所有感兴趣的物种都有一组共同的基因。不完整的基因树可能会影响系统发育基因组学推断的准确性。我们首次提出了由一组不完整基因树诱导的四重奏分布的插补问题,这涉及将缺失的四重奏添加回四重奏分布中。我们提出了基于深度学习的四重奏基因树插补方法(QT-GILD),这是一种自动化且专门定制的无监督深度学习技术,并结合自然语言处理的线索,它可以学习给定的一组不完整基因树中的四重奏分布,并相应地生成一组完整的四重奏。QT-GILD是一种通用技术,无需对主题系统进行显式建模,也无需考虑数据缺失或基因树异质性的原因。对一组模拟和实证数据集的实验研究表明,QT-GILD可以有效地插补四重奏分布,从而显著提高物种树的准确性。值得注意的是,QT-GILD不仅可以插补缺失的四重奏,还可以考虑基因树估计误差。因此,面对数据缺失的情况,QT-GILD推动了从基因树估计物种树的技术水平。

相似文献

1
Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data.基于四重奏的深度学习基因树推断在存在缺失数据的情况下仍能改进系统发育基因组分析。
J Comput Biol. 2022 Nov;29(11):1156-1172. doi: 10.1089/cmb.2022.0212. Epub 2022 Sep 1.
2
Species Tree Estimation from Gene Trees by Minimizing Deep Coalescence and Maximizing Quartet Consistency: A Comparative Study and the Presence of Pseudo Species Tree Terraces.基于最小化深度融合和最大化四重奏一致性的基因树估计种系发生树:比较研究和伪种系发生树阶地的存在。
Syst Biol. 2021 Oct 13;70(6):1213-1231. doi: 10.1093/sysbio/syab026.
3
wQFM: highly accurate genome-scale species tree estimation from weighted quartets.wQFM:基于加权四重奏的高精度基因组规模物种树估计
Bioinformatics. 2021 Nov 5;37(21):3734-3743. doi: 10.1093/bioinformatics/btab428.
4
Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer.存在不完全谱系分选和水平基因转移情况下的系统发育基因组物种树估计
BMC Genomics. 2015;16 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2164-16-S10-S1. Epub 2015 Oct 2.
5
To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.包含还是不包含:基因过滤对物种树估计方法的影响。
Syst Biol. 2018 Mar 1;67(2):285-303. doi: 10.1093/sysbio/syx077.
6
Improving quartet graph construction for scalable and accurate species tree estimation from gene trees.改进四重图构建,以实现从基因树到可扩展和准确的种系发生树估计。
Genome Res. 2023 Jul;33(7):1042-1052. doi: 10.1101/gr.277629.122. Epub 2023 May 17.
7
SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space.SVDquest:在约束搜索空间内使用精确优化提高 SVDquartets 种系树估计。
Mol Phylogenet Evol. 2018 Jul;124:122-136. doi: 10.1016/j.ympev.2018.03.006. Epub 2018 Mar 9.
8
The performance of coalescent-based species tree estimation methods under models of missing data.基于合并的种系发生树估计方法在缺失数据模型下的性能。
BMC Genomics. 2018 May 8;19(Suppl 5):286. doi: 10.1186/s12864-018-4619-8.
9
Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees.基于基因树不确定性的加权可提高基于四元组的种系发生树的准确性。
Mol Biol Evol. 2022 Dec 5;39(12). doi: 10.1093/molbev/msac215.
10
QuCo: quartet-based co-estimation of species trees and gene trees.QuCo:基于四重奏的物种树和基因树的联合估计。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i413-i421. doi: 10.1093/bioinformatics/btac265.

引用本文的文献

1
Leveraging Weighted Quartet Distributions for Enhanced Species Tree Inference from Genome-Wide Data.利用加权四重奏分布从全基因组数据中增强物种树推断
Genome Biol Evol. 2025 Sep 2;17(9). doi: 10.1093/gbe/evaf159.
2
wQFM-TREE: highly accurate and scalable quartet-based species tree inference from gene trees.wQFM-TREE:基于四重奏从基因树中进行高精度且可扩展的物种树推断。
Bioinform Adv. 2025 Mar 13;5(1):vbaf053. doi: 10.1093/bioadv/vbaf053. eCollection 2025.
3
wQFM-DISCO: DISCO-enabled wQFM improves phylogenomic analyses despite the presence of paralogs.
wQFM-DISCO:尽管存在旁系同源物,但启用DISCO的wQFM改善了系统发育基因组分析。
Bioinform Adv. 2024 Nov 27;4(1):vbae189. doi: 10.1093/bioadv/vbae189. eCollection 2024.
4
Comparison of phylogenetic trees defined on different but mutually overlapping sets of taxa: A review.在不同但相互重叠的分类单元集上定义的系统发育树的比较:综述。
Ecol Evol. 2024 Aug 8;14(8):e70054. doi: 10.1002/ece3.70054. eCollection 2024 Aug.
5
Quartet Fiduccia-Mattheyses revisited for larger phylogenetic studies.重新探讨 Fiduccia-Mattheyses 四重奏在更大的系统发育研究中的应用。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad332.