Selvakumar Pavitra, Siddharthan Rahul
The Institute of Mathematical Sciences, Chennai, India.
Homi Bhabha National Institute, Mumbai, India.
R Soc Open Sci. 2024 Jan 24;11(1):231088. doi: 10.1098/rsos.231088. eCollection 2024 Jan.
Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce 'position-specific stationary vectors' (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate 'conditional PSSVs' conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.
转录因子结合位点(TFBS)与其他DNA序列一样,通过与其功能相关的突变和选择而进化。核苷酸进化模型通过单核苷酸突变来描述DNA进化。这种模型的一个平稳向量是核苷酸的长期分布,在该模型下保持不变。中性进化位点可能具有均匀的平稳向量,但人们预计TFBS内的位点反而具有反映这些位置上各种核苷酸适应性的平稳向量。我们引入“位置特异性平稳向量”(PSSV),即TFBS基因座中每个位点的平稳向量集合,类似于常用于描述TFBS的位置权重矩阵(PWM)。我们使用两种进化模型(费尔斯滕森1981年模型和长谷川-木村-矢野1985年模型)推断人类转录因子的PSSV。我们发现PSSV反映了PWM中的核苷酸分布,但特异性有所降低。我们推断个体位置上的祖先核苷酸分布,并计算以多数祖先核苷酸的特定选择为条件的“条件PSSV”。我们发现某些祖先核苷酸对相邻序列施加了强大的进化压力,而其他一些则影响可忽略不计。最后,我们针对中等规模的树提出了F81模型的快速似然计算方法,使得这种方法对于沿着这些思路的大规模研究可行。