Han Mira V, Hahn Matthew W
Department of Biology and School of Informatics, Indiana University, Bloomington, IN 47405, USA.
Pac Symp Biocomput. 2009:114-25.
In this paper we use the length of the shared synteny between genes to identify "parent" orthologs among multiple lineage specific duplicated genes. Genes in the region around each duplicated paralog are compared with the genes flanking an outgroup ortholog to estimate the probability of observing homologs in syntenic vs. non-syntenic regions. The length of the shared synteny is introduced as a hidden variable and is estimated using Expectation-Maximization for each lineage specific paralog. Assuming that the original, parental gene will preserve the longest synteny with the outgroup gene, and that any daughter genes will have a shorter syntenic block, we are able to determine parent-daughter relationships. We apply this method to lineage specific duplications in the human genome, and show that we are able to determine the direction and size of the duplication events that have created hundreds of genes.
在本文中,我们利用基因间共享同线性的长度,在多个谱系特异性重复基因中识别“亲本”直系同源基因。将每个重复旁系同源基因周围区域的基因与一个外类群直系同源基因两侧的基因进行比较,以估计在同线性区域与非同线性区域中观察到同源基因的概率。引入共享同线性的长度作为一个隐藏变量,并针对每个谱系特异性旁系同源基因使用期望最大化算法进行估计。假设原始的亲本基因与外类群基因保持最长的同线性,而任何子代基因将具有较短的同线性片段,我们就能确定亲子关系。我们将此方法应用于人类基因组中的谱系特异性重复事件,结果表明我们能够确定产生数百个基因的重复事件的方向和规模。