Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
Instituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico.
Syst Biol. 2022 Aug 10;71(5):1178-1194. doi: 10.1093/sysbio/syac017.
Reconstructing accurate historical relationships within a species poses numerous challenges, not least in many plant groups in which gene flow is high enough to extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these, we inferred within-species trees and evaluated their significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, phylogenetic trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California in Sonora, Mexico. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects postglacial expansion from the refugia in the south of its range. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects. [Phylogenomics; phylogeography; rooting; Sonoran Desert.].
重建物种内准确的历史关系存在诸多挑战,尤其是在许多植物群中,基因流足够高,可以延伸到物种边界之外。尽管如此,物种内树状历史的程度是一个经验问题,现在有可能利用大量基因组序列来解决这个问题。我们评估了仙人掌科标志性成员仙人掌属的地理分布范围内的系统发育结构,该科以广泛的杂交和多孔的物种边界而闻名。使用 20 个个体在 10 个地点采集的 200GB 全基因组重测序数据,我们组装了两个数据集,其中包含来自蛋白质编码序列的 150000 个双等位基因单核苷酸多态性(SNP)。从这些数据中,我们推断了种内树,并使用五种不同的定性推断方法评估了它们的意义和稳健性。尽管序列多样性低、总体种群规模大、花粉和种子传播媒介广泛,但系统发育树在两个数据集和所有方法中都得到了很好的解决,并且高度一致。我们推断,基于边际似然比较,最有可能的根是在遗传多样性最高的地区的东部和南部,该地区位于墨西哥索诺拉州加利福尼亚湾沿岸。与北部发现的边际似然显著减少一起,这支持了仙人掌目前的分布范围反映了冰期后从南部避难所扩张的假说。最后,我们对种内基因组数据集提出了一些实际和理论问题的观察结果,在种内基因组数据集中,必须使用基于 SNP 的方法,而不是广泛用于序列差异较高的基因树方法。这些问题包括计算可扩展性、基因流的推断以及在存在连锁效应的情况下正确评估统计支持。