Suppr超能文献

多重假设检验以检测仅影响少数位点的正选择谱系。

Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites.

作者信息

Anisimova Maria, Yang Ziheng

机构信息

Department of Biology, University College London, London, UK.

出版信息

Mol Biol Evol. 2007 May;24(5):1219-28. doi: 10.1093/molbev/msm042. Epub 2007 Mar 5.

Abstract

Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.

摘要

随着基因组数据集的迅速增长,检测正向达尔文选择变得愈发重要。最近的密码子替换分支位点模型考虑了树分支上以及序列中各位点间选择压力的变化,并提供了一种检测仅影响少数位点的分子适应性短片段的方法。在基于此类模型的似然比检验中,必须事先指定要检测正向选择的分支。在没有生物学假设来指定所谓前景分支的情况下,可以对多个分支进行检测,但多重检验校正就变得必要了。在本文中,我们采用计算机模拟来评估当使用分支位点模型对系统发育树上的每个分支进行正向选择检测时,6种多重检验校正程序 的性能。其中4种方法控制族错误率(FWER),而另外2种控制错误发现率(FDR)。我们发现,除了序列差异极大和严重违反模型(此时检验可能变得不可靠)的情况外所有校正程序都实现了可接受的FWER。检测正向选择的检验功效受选择强度和序列差异的影响,在中等差异时观察到最高功效。控制FWER的4种校正程序具有相似的功效。我们推荐Rom方法,因为它的功效略高,但简单的Bonferroni校正也可用。控制FDR的2种校正程序功效略高,但FWER也更高。我们通过分析10种哺乳动物物种分化簇2(CD2)基因胞外域的基因序列来展示多重检验程序。我们的模拟和实际数据分析均表明,当必须在同一数据集上对多个分支进行检测时,多重检验程序是有用的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验