Department of Mathematics, University of Bergen, Bergen 5007, Norway.
School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK.
Syst Biol. 2024 Jul 27;73(2):419-433. doi: 10.1093/sysbio/syae009.
Comparative analysis of variables across phylogenetically linked observations can reveal mechanisms and insights in evolutionary biology. As the taxonomic breadth of the sample of interest increases, challenges of data sparsity, poor phylogenetic resolution, and complicated evolutionary dynamics emerge. Here, we investigate a cross-eukaryotic question where all these problems exist: which organismal ecology features are correlated with gene retention in mitochondrial and chloroplast DNA (organelle DNA or oDNA). Through a wide palette of synthetic control studies, we first characterize the specificity and sensitivity of a collection of parametric and non-parametric phylogenetic comparative approaches to identify relationships in the face of such sparse and awkward datasets. This analysis is not directly focused on oDNA, and so provides generalizable insights into comparative approaches with challenging data. We then combine and curate ecological data coupled to oDNA genome information across eukaryotes, including a new semi-automated approach for gathering data on organismal traits from less systematized open-access resources including encyclopedia articles on species and taxa. The curation process also involved resolving several issues with existing datasets, including enforcing the clade-specificity of several ecological features and fixing incorrect annotations. Combining this unique dataset with our benchmarked comparative approaches, we confirm support for several known links between organismal ecology and organelle gene retention, identify several previously unidentified relationships constituting possible ecological contributors to oDNA genome evolution, and provide support for a recently hypothesized link between environmental demand and oDNA retention. We, with caution, discuss the implications of these findings for organelle evolution and of this pipeline for broad comparative analyses in other fields.
对系统发育相关观测值中的变量进行比较分析,可以揭示进化生物学中的机制和见解。随着研究样本的分类学广度增加,数据稀疏、系统发育分辨率差和复杂进化动态等问题也会随之出现。在这里,我们研究了一个存在所有这些问题的跨真核生物问题:哪些生物体生态学特征与线粒体和叶绿体 DNA(细胞器 DNA 或 oDNA)中的基因保留相关。通过广泛的综合控制研究,我们首先描述了一组参数和非参数系统发育比较方法的特异性和敏感性,以在面对这些稀疏和棘手数据集时识别关系。这种分析并不是直接针对 oDNA 的,因此为具有挑战性数据的比较方法提供了可推广的见解。然后,我们结合并整理了真核生物中细胞器 DNA 基因组信息和生态学数据,包括一种从包括物种和分类群在内的百科全书文章等不太系统的开放获取资源中收集生物体特征数据的新半自动方法。整理过程还涉及解决现有数据集的几个问题,包括强制执行几个生态特征的类特异性和纠正错误注释。将这个独特的数据集与我们经过基准测试的比较方法相结合,我们确认了生物体生态学与细胞器基因保留之间的几个已知联系的支持,确定了几个以前未识别的关系,这些关系可能是 oDNA 基因组进化的生态贡献因素,并为环境需求与 oDNA 保留之间的新假设联系提供了支持。我们谨慎地讨论了这些发现对细胞器进化的影响,以及该管道对其他领域广泛比较分析的影响。