Hyyrö Heikki, Juhola Martti, Vihinen Mauno
Department of Computer Sciences, FI-33014 University of Tampere, Finland.
Nucleic Acids Res. 2005 Jul 26;33(13):e115. doi: 10.1093/nar/gni110.
Functional genomics methods are used to investigate the huge amount of information contained in genomes. Numerous experimental methods rely on the use of oligo- or polynucleotides. Nucleotide strand hybridization forms the underlying principle for these methods. For all these techniques, the probes should be unique for analyzed genes. In addition to being unique for the studied genes, the probes should fulfill a large number of criteria to be usable and valid. The criteria include for example, avoidance of self-annealing, suitable melting temperature and nucleotide composition. We developed a method for searching unique and valid oligonucleotides or probes for genes so that there is not even a similar (approximate) occurrence in any other location of the whole genome. By using probe size 25, we analyzed 17 complete genomes representing a wide range of both prokaryotic and eukaryotic organisms. More than 92% of all the genes in the investigated genomes contained valid oligonucleotides. Extensive statistical tests were performed to characterize the properties of unique and valid oligonucleotides. Unique and valid oligonucleotides were relatively evenly distributed in genes except for the beginning and end, which were somewhat overrepresented. The flanking regions in eukaryotes were clearly underrepresented among suitable oligonucleotides. In addition to distributions within genes, the effects on codon and amino acid usage were also studied.
功能基因组学方法用于研究基因组中包含的大量信息。许多实验方法依赖于寡核苷酸或多核苷酸的使用。核苷酸链杂交是这些方法的基本原理。对于所有这些技术,探针对于所分析的基因应该是独特的。除了对所研究的基因具有独特性之外,探针还应满足大量标准才能可用且有效。这些标准包括例如避免自身退火、合适的解链温度和核苷酸组成。我们开发了一种方法来搜索基因的独特且有效的寡核苷酸或探针,使得在整个基因组的任何其他位置都不会有相似(近似)的出现。通过使用25个碱基对的探针大小,我们分析了17个完整基因组,这些基因组代表了广泛的原核生物和真核生物。在所研究的基因组中,超过92%的基因包含有效的寡核苷酸。进行了广泛的统计测试来表征独特且有效的寡核苷酸的特性。独特且有效的寡核苷酸在基因中相对均匀分布,但基因的起始和结尾部分有所过度代表。真核生物中的侧翼区域在合适的寡核苷酸中明显代表性不足。除了基因内的分布外,还研究了对密码子和氨基酸使用的影响。