Hahn Matthew W
Department of Biology and School of Informatics, E, 3rd Street, Indiana University, Bloomington, IN 47405, USA.
Genome Biol. 2007;8(7):R141. doi: 10.1186/gb-2007-8-7-r141.
Comparative genomic studies are revealing frequent gains and losses of whole genes via duplication and pseudogenization. One commonly used method for inferring the number and timing of gene gains and losses reconciles the gene tree for each gene family with the species tree of the taxa considered. Recent studies using this approach have found a large number of ancient duplications and recent losses among vertebrate genomes.
I show that tree reconciliation methods are biased when the inferred gene tree is not correct. This bias places duplicates towards the root of the tree and losses towards the tips of the tree. I demonstrate that this bias is present when tree reconciliation is conducted on both multiple mammal and Drosophila genomes, and that lower bootstrap cut-off values on gene trees lead to more extreme bias. I also suggest a method for dealing with reconciliation bias, although this method only corrects for the number of gene gains on some branches of the species tree.
Based on the results presented, it is likely that most tree reconciliation analyses show biases, unless the gene trees used are exceptionally well-resolved and well-supported. These results cast doubt upon previous conclusions that vertebrate genome history has been marked by many ancient duplications and many recent gene losses.
比较基因组学研究揭示了通过基因复制和假基因化导致的全基因频繁增减现象。一种常用的推断基因增减数量和时间的方法是将每个基因家族的基因树与所考虑分类单元的物种树进行比对。最近使用这种方法的研究在脊椎动物基因组中发现了大量古老的基因复制和近期的基因丢失。
我发现当推断的基因树不正确时,树比对方法存在偏差。这种偏差使复制事件偏向树的根部,而丢失事件偏向树的末端。我证明在对多个哺乳动物和果蝇基因组进行树比对时这种偏差都存在,并且基因树较低的自展值会导致更极端的偏差。我还提出了一种处理比对偏差的方法,尽管该方法仅能校正物种树某些分支上的基因增加数量。
基于所呈现的结果,除非所使用的基因树分辨率极高且支持充分,否则大多数树比对分析可能都存在偏差。这些结果对先前关于脊椎动物基因组历史以许多古老的基因复制和许多近期的基因丢失为特征的结论提出了质疑。