Suppr超能文献

SWAMP:用于 PAML 的滑动窗口对齐掩蔽工具。

SWAMP: Sliding Window Alignment Masker for PAML.

机构信息

Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

出版信息

Evol Bioinform Online. 2014 Dec 1;10:197-204. doi: 10.4137/EBO.S18193. eCollection 2014.

Abstract

With the greater availability of genetic data, large genome-wide scans for positive selection increasingly incorporate data from a range of sources. These data sets may be derived from different sequencing methods, each of which has potential sources of error. Sequencing errors, compounded by alignment errors, greatly increase the number of false positives in tests for adaptive evolution. Genome-wide analyses often fail to fully address these issues or to provide sufficient detail on postalignment masking/filtering. Here, we introduce a Sliding Window Alignment Masker for Phylogenetic Analysis by Maximum Likelihood (SWAMP) that scans multiple-sequence alignments for short regions enriched with unreasonably high rates of nonsynonymous substitutions caused, for example, by sequence or alignment errors. SWAMP prevents their inclusion in downstream evolutionary analyses and therefore increases the reliability of downstream analyses. It is able to effectively mask short stretches of erroneous sequence, particularly prevalent in low-coverage genomes, which may not be detected by existing methods based on filtering by sitewise conservation or alignment confidence. SWAMP offers a flexible masking approach, and the user can apply different masking regimens to specific branches or sequences in the phylogeny allowing the stringency of masking to vary according to branch length, expected divergence levels, or assembly quality. We exemplify SWAMPs effectiveness on a dataset of 6,379 protein-coding genes from primate species, including data of variable quality. Full reporting of the software parameters will further improve the reproducibility of genome-wide analyses, as well as reduce false-positive rates.

摘要

随着基因数据的可用性不断提高,越来越多的全基因组范围的正选择扫描开始整合来自各种来源的数据。这些数据集可能来自不同的测序方法,每种方法都有潜在的误差来源。测序错误加上比对错误,大大增加了适应性进化测试中的假阳性数量。全基因组分析往往未能充分解决这些问题,也未能提供关于对位调整后掩蔽/过滤的足够详细信息。在这里,我们引入了一种基于最大似然法的滑动窗口比对掩蔽用于系统发育分析(SWAMP),它可以扫描多序列比对,寻找由于序列或比对错误而导致非同义替换率异常高的短区域。SWAMP 可以防止它们被包含在下游的进化分析中,从而提高下游分析的可靠性。它能够有效地屏蔽短的错误序列,特别是在低覆盖率的基因组中,这些错误可能无法被基于位点保守性或比对置信度过滤的现有方法检测到。SWAMP 提供了一种灵活的屏蔽方法,用户可以将不同的屏蔽方案应用于系统发育树中的特定分支或序列,根据分支长度、预期的分歧水平或组装质量来调整屏蔽的严格程度。我们以来自灵长类物种的 6379 个蛋白质编码基因数据集为例,说明了 SWAMP 的有效性,其中包括质量可变的数据。软件参数的充分报告将进一步提高全基因组分析的可重复性,并降低假阳性率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f69/4251194/39cbadadffc3/ebo-10-2014-197f1.jpg

相似文献

1
SWAMP: Sliding Window Alignment Masker for PAML.SWAMP:用于 PAML 的滑动窗口对齐掩蔽工具。
Evol Bioinform Online. 2014 Dec 1;10:197-204. doi: 10.4137/EBO.S18193. eCollection 2014.
6
LMAP: Lightweight Multigene Analyses in PAML.LMAP:PAML中的轻量级多基因分析
BMC Bioinformatics. 2016 Sep 6;17(1):354. doi: 10.1186/s12859-016-1204-5.
8
Vestige: maximum likelihood phylogenetic footprinting.痕迹:最大似然系统发育足迹法。
BMC Bioinformatics. 2005 May 29;6:130. doi: 10.1186/1471-2105-6-130.

引用本文的文献

本文引用的文献

1
Ensembl 2014.Ensembl 2014.
Nucleic Acids Res. 2014 Jan;42(Database issue):D749-55. doi: 10.1093/nar/gkt1196. Epub 2013 Dec 6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验