Zhang Yu, Song Giltae, Hsu Chih-Hao, Miller Webb
Department of Statistics, 326 Thomas Building, Penn State University, University Park, PA 16802, USA.
Pac Symp Biocomput. 2009:162-73.
Genomic intervals that contain a cluster of similar genes are of extreme biological interest, but difficult to sequence and analyze. One goal for interspecies comparisons of such intervals is to reconstruct a parsimonious series of duplications, deletions, and speciation events (a putative evolutionary history) that could have created the contemporary clusters from their last common ancestor. We describe a new method for reconstructing such an evolutionary scenario for a given set of intervals from present-day genomes, based on the statistical technique of Sequential Importance Sampling. An implementation of the method is evaluated using (1) artificial datasets generated by simulating the operations of duplication, deletion, and speciation starting with featureless "ancestral" sequences, and (2) by comparing the inferred evolutionary history of the amino-acid sequences for the CYP2 gene family from human chromosome 19, chimpanzee, orangutan, rhesus macaque, and dog, as computed by a standard phylogenetic-tree reconstruction method.
包含一组相似基因的基因组区间具有极高的生物学研究价值,但测序和分析难度较大。对此类区间进行种间比较的一个目标是重建一系列简约的复制、缺失和物种形成事件(一种假定的进化史),这些事件可能从它们的最后一个共同祖先产生了当代的基因簇。我们描述了一种基于顺序重要性抽样统计技术,为来自现代基因组的给定区间集重建这种进化场景的新方法。该方法的一个实现通过以下方式进行评估:(1)使用从无特征的“祖先”序列开始模拟复制、缺失和物种形成操作生成的人工数据集,以及(2)通过比较由标准系统发育树重建方法计算得出的人类19号染色体、黑猩猩、猩猩、恒河猴和狗的CYP2基因家族氨基酸序列的推断进化史。