Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao, China.
Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.
Mol Ecol Resour. 2018 May;18(3):700-713. doi: 10.1111/1755-0998.12750. Epub 2018 Feb 8.
Microeukaryotes are among the most important components of the microbial food web in almost all aquatic and terrestrial ecosystems worldwide. In order to gain a better understanding their roles and functions in ecosystems, sequencing coupled with phylogenomic analyses of entire genomes or transcriptomes is increasingly used to reconstruct the evolutionary history and classification of these microeukaryotes and thus provide a more robust framework for determining their systematics and diversity. More importantly, phylogenomic research usually requires high levels of hands-on bioinformatics experience. Here, we propose an efficient automated method, "Guided Phylogenomic Search in trees" (GPSit), which starts from predicted protein sequences of newly sequenced species and a well-defined customized orthologous database. Compared with previous protocols, our method streamlines the entire workflow by integrating all essential and other optional operations. In so doing, the manual operation time for reconstructing phylogenetic relationships is reduced from days to several hours, compared to other methods. Furthermore, GPSit supports user-defined parameters in most steps and thus allows users to adapt it to their studies. The effectiveness of GPSit is demonstrated by incorporating available online data and new single-cell data of three nonculturable marine ciliates (Anteholosticha monilata, Deviata sp. and Diophrys scutum) under moderate sequencing coverage (~5×). Our results indicate that the former could reconstruct robust "deep" phylogenetic relationships while the latter reveals the presence of intermediate taxa in shallow relationships. Based on empirical phylogenomic data, we also used GPSit to evaluate the impact of different levels of missing data on two commonly used methods of phylogenetic analyses, maximum likelihood (ML) and Bayesian inference (BI) methods. We found that BI is less sensitive to missing data when fast-evolving sites are removed.
微真核生物是全球几乎所有水生和陆地生态系统中微生物食物网的最重要组成部分之一。为了更好地了解它们在生态系统中的作用和功能,越来越多地采用测序结合全基因组或转录组的系统发生基因组学分析来重建这些微真核生物的进化历史和分类,从而为确定它们的系统发育和多样性提供更强大的框架。更重要的是,系统发生基因组学研究通常需要高水平的实践生物信息学经验。在这里,我们提出了一种高效的自动化方法“树指导系统发生搜索”(GPSit),该方法从新测序物种的预测蛋白质序列和定义明确的定制直系同源数据库开始。与以前的协议相比,我们的方法通过集成所有必要的和其他可选的操作来简化整个工作流程。这样,与其他方法相比,重建系统发育关系的手动操作时间从几天减少到几个小时。此外,GPSit 在大多数步骤中支持用户定义的参数,因此允许用户根据自己的研究进行调整。通过整合可用的在线数据和三个不可培养的海洋纤毛虫(Anteholosticha monilata、Deviata sp. 和 Diophrys scutum)的新单细胞数据(中等测序覆盖度 (~5×)),证明了 GPSit 的有效性。我们的结果表明,前者可以重建强大的“深层”系统发育关系,而后者则揭示了浅层关系中中间分类群的存在。基于经验系统发生基因组学数据,我们还使用 GPSit 评估了不同缺失数据水平对两种常用系统发育分析方法(最大似然(ML)和贝叶斯推断(BI)方法)的影响。我们发现,在去除快速进化位点后,BI 对缺失数据的敏感性较低。