Doğan Berat, Najafabadi Hamed S
Department of Human Genetics, McGill University, Montreal, QC, Canada.
McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada.
Methods Mol Biol. 2018;1867:15-28. doi: 10.1007/978-1-4939-8799-3_2.
CysHis zinc-finger proteins (C2H2-ZFPs) constitute the largest class of human transcription factors (TFs) and also the least characterized one. Determining the DNA sequence preferences of C2H2-ZFPs is an important first step toward elucidating their roles in transcriptional regulation. Among the most promising approaches for obtaining the sequence preferences of C2H2-ZFPs are those that combine machine-learning predictions with in vivo binding maps of these proteins. Here, we provide a protocol and guidelines for predicting the DNA-binding preferences of C2H2-ZFPs from their amino acid sequences using a machine learning-based recognition code. This protocol also describes the tools and steps to combine these predictions with ChIP-seq data to remove inaccuracies, identify the zinc-finger domains within each C2H2-ZFP that engage with DNA in vivo, and pinpoint the genomic binding sites of the C2H2-ZFPs.
半胱氨酸-组氨酸锌指蛋白(C2H2-ZFPs)构成了人类转录因子(TFs)中最大的一类,也是特征描述最少的一类。确定C2H2-ZFPs的DNA序列偏好是阐明它们在转录调控中作用的重要第一步。在获取C2H2-ZFPs序列偏好的最有前景的方法中,有一些是将机器学习预测与这些蛋白质的体内结合图谱相结合的方法。在这里,我们提供了一个协议和指南,用于使用基于机器学习的识别码从C2H2-ZFPs的氨基酸序列预测其DNA结合偏好。该协议还描述了将这些预测与ChIP-seq数据相结合以消除不准确之处、识别每个C2H2-ZFP中在体内与DNA结合的锌指结构域以及确定C2H2-ZFPs的基因组结合位点的工具和步骤。