Laboratory of Microbiology and Genetics - Ospedale di Circolo e Fondazione Macchi, University of Insubria, Viale Borri 57 21100 Varese, Italy.
Bioinformatics. 2010 Jul 15;26(14):1777-8. doi: 10.1093/bioinformatics/btq258. Epub 2010 May 25.
The complete sequencing of the human genome shows that only 1% of the entire genome encodes for proteins. The major part of the genome is made up of non-coding DNA, regulatory elements and junk DNA. Transcriptional regulation plays a central role in a multitude of critical cellular processes and responses, and it is a central force in the development and differentiation of multicellular organisms. Identifying regulatory elements is one of the major tasks in this challenge. To accomplish this task, we developed a solid and simple suite that allows direct access to genomic database and immediate result check. We introduce COMPASSS (COMplex PAttern of Sequence Search Software), a simple and effective tool for motif search in entire genomes. Motifs can be partially degenerated and interrupted by spacers of variable length.
We demonstrate through real biological data mining the simplicity and robustness of this tool. The test was performed on two well-known protein domains and a highly variable cis-acting element. COMPASSS successfully identifies both protein domains and cis-acting semi-conserved elements.
The COMPASSS suite is available for Windows free of charge from our web sites: compasss.sourceforge.net/; www.stefanolandi.eu/
人类基因组的完整测序表明,整个基因组中只有 1%的部分编码蛋白质。基因组的主要部分由非编码 DNA、调控元件和垃圾 DNA 组成。转录调控在多种关键细胞过程和反应中起着核心作用,是多细胞生物发育和分化的核心力量。鉴定调控元件是这一挑战中的主要任务之一。为了完成这项任务,我们开发了一套可靠而简单的套件,允许直接访问基因组数据库并立即检查结果。我们引入了 COMPASSS(序列搜索软件的复杂模式),这是一种用于整个基因组中 motif 搜索的简单而有效的工具。 motif 可以部分退化,并由可变长度的间隔隔开。
我们通过真实的生物数据挖掘证明了该工具的简单性和鲁棒性。该测试在两个著名的蛋白质结构域和一个高度可变的顺式作用元件上进行。 COMPASSS 成功地识别了蛋白质结构域和顺式作用半保守元件。
COMPASSS 套件可从我们的网站免费下载,适用于 Windows 系统:compasss.sourceforge.net/;www.stefanolandi.eu/