Kato Mamoru, Sekine Akihiro, Ohnishi Yozo, Johnson Todd A, Tanaka Toshihiro, Nakamura Yusuke, Tsunoda Tatsuhiko
SNP Research Center, RIKEN, Yokohama, Japan.
BMC Genomics. 2006 Dec 28;7:326. doi: 10.1186/1471-2164-7-326.
The strong linkage disequilibrium (LD) recently found in genic or exonic regions of the human genome demonstrated that LD can be increased by evolutionary mechanisms that select for functionally important loci. This suggests that LD might be stronger in regions conserved among species than in non-conserved regions, since regions exposed to natural selection tend to be conserved. To assess this hypothesis, we used genome-wide polymorphism data from the HapMap project and investigated LD within DNA sequences conserved between the human and mouse genomes.
Unexpectedly, we observed that LD was significantly weaker in conserved regions than in non-conserved regions. To investigate why, we examined sequence features that may distort the relationship between LD and conserved regions. We found that interspersed repeats, and not other sequence features, were associated with the weak LD tendency in conserved regions. To appropriately understand the relationship between LD and conserved regions, we removed the effect of repetitive elements and found that the high degree of sequence conservation was strongly associated with strong LD in coding regions but not with that in non-coding regions.
Our work demonstrates that the degree of sequence conservation does not simply increase LD as predicted by the hypothesis. Rather, it implies that purifying selection changes the polymorphic patterns of coding sequences but has little influence on the patterns of functional units such as regulatory elements present in non-coding regions, since the former are generally restricted by the constraint of maintaining a functional protein product across multiple exons while the latter may exist more as individually isolated units.
最近在人类基因组的基因或外显子区域发现的强连锁不平衡(LD)表明,LD可通过选择功能重要位点的进化机制而增加。这表明,由于暴露于自然选择的区域往往是保守的,物种间保守区域的LD可能比非保守区域更强。为了评估这一假设,我们使用了国际人类基因组单体型图计划(HapMap计划)的全基因组多态性数据,并研究了人类和小鼠基因组之间保守的DNA序列中的LD。
出乎意料的是,我们观察到保守区域的LD明显弱于非保守区域。为了探究原因,我们检查了可能扭曲LD与保守区域之间关系的序列特征。我们发现,散布重复序列而非其他序列特征与保守区域中LD较弱的趋势相关。为了正确理解LD与保守区域之间的关系,我们去除了重复元件的影响,发现高度的序列保守性与编码区域中较强的LD密切相关,但与非编码区域中的LD无关。
我们的研究表明,序列保守程度并不像假设所预测的那样简单地增加LD。相反,这意味着纯化选择改变了编码序列的多态模式,但对非编码区域中存在的调控元件等功能单元的模式影响很小,因为前者通常受到在多个外显子上维持功能性蛋白质产物的限制,而后者可能更多地作为单独分离的单元存在。