Kim Jaebum, He Xin, Sinha Saurabh
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America.
PLoS Genet. 2009 Jan;5(1):e1000330. doi: 10.1371/journal.pgen.1000330. Epub 2009 Jan 9.
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor-specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection.
对作用于顺式调控序列的进化限制进行表征,对于比较基因组学至关重要,并能为生物多样性的进化提供关键见解。我们研究了12种果蝇物种中直系同源顺式调控模块(CRM)之间的关系,特别是转录因子结合位点的进化,并报告了支持关键进化假说的统计证据。发现结合位点具有位置特异性替换率。然而,位点不同位置的选择力并非独立起作用,证据表明对位点的限制通常基于其确切的结合亲和力。结合位点的丢失符合分子钟假说。位点丢失的速率是转录因子特异性的,取决于结合强度,在某些情况下还取决于附近其他结合位点的存在。我们的分析基于一种新颖的计算方法,用于在系统发育树上比对直系同源CRM,该方法严格考虑了比对的不确定性,并通过统一的概率框架利用结合位点预测。最后,我们报告了对短缺失的弱纯化选择,为CRM的整体空间限制提供了重要线索。我们的结果呈现了一幅复杂的调控序列进化图景,具有很大的可塑性,这取决于多种因素。本研究中获得的见解将有助于我们理解基因调控的组合控制及其进化方式。它们将为认识调控序列进化重要决定因素的理论模型铺平道路,对于全基因组范围内识别处于纯化或正选择下的非编码序列至关重要。