Departamento de Genética, Universidade Federal Do Rio de Janeiro, Caixa Postal 68011, Rio de Janeiro, 21941-971, Brazil.
BMC Evol Biol. 2020 Nov 2;20(1):141. doi: 10.1186/s12862-020-01703-7.
The Drosophilidae family is traditionally divided into two subfamilies: Drosophilinae and Steganinae. This division is based on morphological characters, and the two subfamilies have been treated as monophyletic in most of the literature, but some molecular phylogenies have suggested Steganinae to be paraphyletic. To test the paraphyletic-Steganinae hypothesis, here, we used genomic sequences of eight Drosophilidae (three Steganinae and five Drosophilinae) and two Ephydridae (outgroup) species and inferred the phylogeny for the group based on a dataset of 1,028 orthologous genes present in all species (> 1,000,000 bp). This dataset includes three genera that broke the monophyly of the subfamilies in previous works. To investigate possible biases introduced by small sample sizes and automatic gene annotation, we used the same methods to infer species trees from a set of 10 manually annotated genes that are commonly used in phylogenetics.
Most of the 1,028 gene trees depicted Steganinae as paraphyletic with distinct topologies, but the most common topology depicted it as monophyletic (43.7% of the gene trees). Despite the high levels of gene tree heterogeneity observed, species tree inference in ASTRAL, in PhyloNet, and with the concatenation approach strongly supported the monophyly of both subfamilies for the 1,028-gene dataset. However, when using the concatenation approach to infer a species tree from the smaller set of 10 genes, we recovered Steganinae as a paraphyletic group. The pattern of gene tree heterogeneity was asymmetrical and thus could not be explained solely by incomplete lineage sorting (ILS).
Steganinae was clearly a monophyletic group in the dataset that we analyzed. In addition to ILS, gene tree discordance was possibly the result of introgression, suggesting complex branching processes during the early evolution of Drosophilidae with short speciation intervals and gene flow. Our study highlights the importance of genomic data in elucidating contentious phylogenetic relationships and suggests that phylogenetic inference for drosophilids based on small molecular datasets should be performed cautiously. Finally, we suggest an approach for the correction and cleaning of BUSCO-derived genomic datasets that will be useful to other researchers planning to use this tool for phylogenomic studies.
果蝇科传统上分为两个亚科:果蝇亚科和Steganinae。这种划分基于形态特征,并且在大多数文献中,这两个亚科被视为单系的,但一些分子系统发育学研究表明 Steganinae 是并系的。为了检验并系 Steganinae 假说,在这里,我们使用了 8 种果蝇科(3 种 Steganinae 和 5 种果蝇亚科)和 2 种 Ephydridae(外群)物种的基因组序列,并基于存在于所有物种中的 1028 个直系同源基因数据集(超过 100 万个碱基对)推断了该组的系统发育。该数据集包括三个打破先前研究中亚科单系性的属。为了研究小样本量和自动基因注释可能引入的偏差,我们使用相同的方法从一组 10 个常用于系统发育学的手动注释基因中推断物种树。
大多数 1028 个基因树描绘的 Steganinae 是并系的,具有不同的拓扑结构,但最常见的拓扑结构将其描绘为单系(43.7%的基因树)。尽管观察到基因树高度异质性,但 ASTRAL、PhyloNet 和连接体方法的物种树推断强烈支持 1028 个基因数据集的两个亚科的单系性。然而,当使用连接体方法从较小的 10 个基因数据集推断物种树时,我们发现 Steganinae 是一个并系群体。基因树异质性的模式是不对称的,因此不能仅用不完全谱系分选(ILS)来解释。
Steganinae 在我们分析的数据集显然是一个单系群体。除了 ILS 之外,基因树分歧可能是基因渗入的结果,这表明在果蝇科的早期进化过程中存在复杂的分支过程,具有短暂的物种形成间隔和基因流。我们的研究强调了基因组数据在阐明有争议的系统发育关系方面的重要性,并表明基于小的分子数据集对果蝇进行系统发育推断时应谨慎进行。最后,我们提出了一种纠正和清理 BUSCO 衍生基因组数据集的方法,这将对其他计划使用该工具进行系统发育研究的研究人员有用。