Hu Guangan, Llinás Manuel, Li Jingguang, Preiser Peter Rainer, Bozdech Zbynek
School of Biological Sciences, Nanyang Technological University, No, 60 Nanyang Drive, 637551, Singapore.
BMC Bioinformatics. 2007 Sep 19;8:350. doi: 10.1186/1471-2105-8-350.
The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.
Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.
OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.
用于点阵式DNA微阵列的长寡核苷酸设计需要格外关注细节,以确保其在杂交过程中的最佳性能。主要挑战在于选择一个最佳的寡核苷酸元件,该元件代表基因组中的每个基因座/基因,并且是独特的,没有内部结构和重复序列,其解链温度(Tm)与微阵列上的所有其他元件一致。目前,所有公开可用的用于DNA长寡核苷酸微阵列选择的程序都使用各种截止值组合,其中每个参数(独特性、Tm和二级结构)都被单独评估和过滤。然而,使用截止值可能会导致信息丢失和选择次优的寡核苷酸,特别是对于GC含量分布极端、重复序列比例大或存在具有高度同源成员的大基因家族的基因组。
在此,我们展示了程序OligoRankPick,它使用基于加权排名的策略,通过整数加权线性函数来选择微阵列寡核苷酸元件。这种方法针对每个基因单独优化选择标准(权重分数),适应基因组中DNA序列的可变特性。使用三种微生物基因组——大肠杆菌、酿酒酵母和人类疟原虫恶性疟原虫对设计的算法进行了测试。与其他已发表的算法相比,OligoRankPick在所有三个基因组的寡核苷酸设计方面都有显著改进,在恶性疟原虫的微阵列设计中观察到最显著的改进,其基因组的特点是GC含量波动大且基因重复丰富。
OligoRankPick是一种用于设计长寡核苷酸DNA微阵列的有效工具,它不依赖于通过参数截止值直接排除寡核苷酸,而是在相互关联的背景下优化所有参数。该算法使用的加权排名总和策略提供了高度灵活的寡核苷酸选择,适应了许多生物体基因组中DNA序列特性的极端变异性。