Suppr超能文献

PhylteR:系统发生基因组数据集中外点序列的有效识别。

PhylteR: Efficient Identification of Outlier Sequences in Phylogenomic Datasets.

机构信息

French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France.

IRD, CIRAD, INRAE, Institut Agro, PHIM Plant Health Institute, Montpellier University, Montpellier, France.

出版信息

Mol Biol Evol. 2023 Nov 3;40(11). doi: 10.1093/molbev/msad234.

Abstract

In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).

摘要

在系统发生基因组学中,由于人为和生物原因导致的基因树之间的不一致性会降低信号与噪声比,并使物种树推断复杂化。今天在经典系统发生基因组学分析中处理的数据量排除了手动错误检测和删除。然而,仍然缺少一种简单而有效的方法来自动识别基因树集合中的异常值。在这里,我们提出了 PhylteR,这是一种方法,可以快速准确地检测系统发生基因组学数据集中的异常序列,即来自单个基因树的物种,它们不符合总体趋势。PhylteR 依赖于 DISTATIS,这是多维缩放的扩展,可以同时比较多个距离矩阵。在 PhylteR 中,这些从单个基因系统发育中提取的距离矩阵根据每个基因表示物种之间的进化距离。在模拟数据集上,我们表明 PhylteR 比可比的现有方法更敏感和精确地识别异常值。我们还表明 PhylteR 对由 ILS 引起的不一致性不敏感,这是一个理想的特征。在以前为 Carnivora 系统发生基因组学组装的 53 个物种的 14463 个基因的生物数据集上,我们表明:(i)PhylteR 可以识别出可以通过其他方法认为是异常值的序列,并且(ii)这些序列的删除可以提高基因树和物种树之间的一致性。由于生成了许多图形输出,PhylteR 还允许对手头数据集进行快速轻松的可视化特征描述,从而有助于精确识别错误。PhylteR 作为 R 包在 CRAN 上发布,并提供容器化版本(docker 和 singularity)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de64/10655845/014ff5d48fb5/msad234f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验