School of Medicine, Northwest University, Xi'an, China.
Ludwig Institute for Cancer Research, La Jolla, CA, USA.
Nature. 2021 Mar;591(7848):147-151. doi: 10.1038/s41586-021-03211-0. Epub 2021 Jan 27.
Many sequence variants have been linked to complex human traits and diseases, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor-DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.
许多序列变体与复杂的人类特征和疾病有关,但破译它们的生物学功能仍然具有挑战性,因为它们大多数位于非编码 DNA 中。在这里,我们使用一种称为通过指数富集的配体系统进化进行单核苷酸多态性评估(SNP-SELEX)的超高通量多重蛋白质-DNA 结合测定法,系统地评估了 270 个人类转录因子与人类基因组中 95886 个非编码变体的结合。由此产生的 8.28 亿个转录因子-DNA 相互作用的测量结果可用于体外估计这些转录因子对每个变体的相对亲和力,并评估当前预测非编码变体对转录因子结合影响的方法。我们表明,大多数转录因子的位置权重矩阵缺乏足够的预测能力,而支持向量机与缺口 k-mer 表示相结合时,在评估涉及新的 61020 个序列变体的独立 SNP-SELEX 实验的结果时,表现出更好的性能。我们报告了 94 个人类转录因子的高度预测模型,并证明了它们在全基因组关联研究以及理解涉及多种人类特征和疾病的分子途径中的应用。