Schmid Christoph D, Sengstag Thierry, Bucher Philipp, Delorenzi Mauro
Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W201-5. doi: 10.1093/nar/gkm343. Epub 2007 May 25.
A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/.
在对来自高通量技术的大规模基因组注释数据进行分析时,一项反复出现的任务是在有噪声的信号图谱中识别峰或簇。此类应用的例子包括基于转录起始位点图谱定义启动子、基于芯片免疫沉淀(ChIP-chip)数据绘制转录因子结合位点,以及从全基因组单核苷酸多态性(SNP)图谱中识别数量性状基因座(QTL)。这种分析的输入是一组与计数或强度相关的基因组坐标。输出包括具有各自体积、延伸范围和中心位置的离散数量的峰。为此,我们开发了一种灵活的一维聚类工具,称为MADAP,它既可以作为网络服务器使用,也可以作为独立程序使用。一组参数使用户能够针对特定问题定制该程序。该网络服务器以文本和图形形式返回结果,适用于中小型应用,以及鉴于大规模应用(需要本地安装)进行评估和参数调整。用C++编写的程序可从ftp://ftp.epd.unil.ch/pub/software/unix/madap免费下载。MADAP网络服务器可通过http://www.isrec.isb-sib.ch/madap/访问。