Wong Karen M, Suchard Marc A, Huelsenbeck John P
Section of Ecology, Behavior and Evolution, University of California, San Diego, La Jolla, CA 92093, USA.
Science. 2008 Jan 25;319(5862):473-6. doi: 10.1126/science.1151532.
The statistical methods applied to the analysis of genomic data do not account for uncertainty in the sequence alignment. Indeed, the alignment is treated as an observation, and all of the subsequent inferences depend on the alignment being correct. This may not have been too problematic for many phylogenetic studies, in which the gene is carefully chosen for, among other things, ease of alignment. However, in a comparative genomics study, the same statistical methods are applied repeatedly on thousands of genes, many of which will be difficult to align. Using genomic data from seven yeast species, we show that uncertainty in the alignment can lead to several problems, including different alignment methods resulting in different conclusions.
应用于基因组数据分析的统计方法并未考虑序列比对中的不确定性。实际上,比对被视为一种观测结果,并且所有后续推断都依赖于比对的正确性。对于许多系统发育研究而言,这可能并非太大问题,在这些研究中,除其他因素外,会精心选择易于比对的基因。然而,在一项比较基因组学研究中,相同的统计方法会在数千个基因上反复应用,其中许多基因将难以比对。利用来自七种酵母物种的基因组数据,我们表明比对中的不确定性会导致若干问题,包括不同的比对方法得出不同的结论。