Gadiraju Sashidhar, Vyhlidal Carrie A, Leeder J Steven, Rogan Peter K
Laboratory of Human Molecular Genetics, Children's Mercy Hospital and Clinics, School of Medicine, and School of Interdisciplinary Computer Science and Engineering University of Missouri-Kansas City, Kansas City, MO 64108 USA.
BMC Bioinformatics. 2003 Sep 8;4:38. doi: 10.1186/1471-2105-4-38.
We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined) weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices.
Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4-6 hours for transcription factor binding sites and 10-19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths.
Delila-Genome was used to scan the human genome sequence with information weight matrices of transcription factor binding sites, including PXR/RXRalpha, AHR and NF-kappaB p50/p65, and matrices for RNA binding sites including splice donor, acceptor, and SC35 recognition sites. Comparisons of genome scans with the original and refined PXR/RXRalpha information weight matrices indicate that the refined model more accurately predicts the strengths of known binding sites and is more sensitive for detection of novel binding sites.
我们展示了Delila-genome,这是一个用于在完整基因组序列中识别、可视化和分析蛋白质结合位点的软件系统。通过使用基于信息论(或用户定义)的权重矩阵扫描基因组序列来预测结合位点。通过将实验确定的结合位点添加到已发表的结合位点中对矩阵进行优化。Delila-Genome用于检验用优化矩阵检测到的结合位点的个体信息含量的准确性,以此作为相应蛋白质-核酸相互作用强度的一种度量。然后该软件可用于通过用优化矩阵重新扫描基因组来预测新的位点。
基因组扫描参数通过基于Java的图形用户界面(GUI)和Perl后端脚本输入。多处理器CPU负载分担使不同染色体扫描的平均响应时间最小化。在24节点和3节点的Mosix和Beowulf集群上,对人类基因组组装体进行扫描,转录因子结合位点需要4 - 6小时,剪接位点需要10 - 19小时。单个结合位点可以在UCSC基因组浏览器中以高分辨率序列游走器或低分辨率自定义轨迹的形式显示。对于大型数据集,我们应用了一种数据缩减策略,将超过阈值信息含量的结合位点显示限制在基因内部或相邻的特定染色体区域。生成一个HTML文档,列出按结合位点强度或染色体位置排序的结合位点,这些位点超链接到UCSC自定义轨迹、其他注释数据库和结合位点序列。基因组扫描后工具解析选定染色体区间的结合位点注释,并使用不同权重矩阵比较基因组扫描结果。多次基因组扫描的比较可以显示每次扫描特有的结合位点,并识别结合强度有显著变化的位点。
Delila-Genome用于使用转录因子结合位点的信息权重矩阵(包括PXR/RXRalpha、AHR和NF-κB p50/p65)以及RNA结合位点的矩阵(包括剪接供体、受体和SC35识别位点)扫描人类基因组序列。对原始和优化的PXR/RXRalpha信息权重矩阵进行基因组扫描的比较表明,优化后的模型能更准确地预测已知结合位点的强度,并且对新结合位点的检测更敏感。