Department of Computer Engineering, Kyungpook National University, Daegu 702-701, South Korea.
BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2164-10-S3-S3.
Oligonucleotide design is known as a time-consuming work in bioinformatics. In order to accelerate and be efficient the oligonucleotide design process, one of widely used approach is the prescreening unreliable regions using a hashing (or seeding) algorithm. Since the seeding algorithm is originally proposed to increase sensitivity for local alignment, the specificity should be considered as well as the sensitivity for the oligonucleotide design problem. However, a measure of evaluating the seeds regarding how adequate and efficient they are in the oligo design is not yet proposed. Here, we propose novel measures of evaluating the seeding algorithms based on the discriminability and the efficiency.
To evaluate the proposed measures, we examine five seeding algorithms in oligonucleotide design. We carried out a series of experiments to compare the seeding algorithms. As the result, the spaced seed is recorded as the most efficient discriminative seed for oligo design. The performance of transition-constrained seed is slightly lower than the spaced seed. Because BLAT seeding algorithm and Vector seeding algorithm give poor scores in specificity and efficiency, we conclude that these algorithms are not adequate to design oligos. Consequently, we recommend spaced seeds or transition-constrained seeds with 15 approximately 18 weight in order to design oligos with the length of 50 mer. The empirical experiments in real biological data reveal that the recommended seeds show consequently good performance. We also propose a software package which enables the users to get the adequate seeds under their own experimental conditions.
Our study is valuable to the two points. One is that our study can be applied to the oligo design programs in order to improve the performance by suggesting the experiment-specific seeds. The other is that our study is useful to improve the performance of the mapping assembly in the field of Next-Generation Sequencing. Our proposed measures are originally designed to be used for oligo design but we expect that our study will be helpful to the other genomic tasks.
寡核苷酸设计在生物信息学中是一项耗时的工作。为了加速和提高寡核苷酸设计的效率,一种广泛使用的方法是使用哈希(或播种)算法预先筛选不可靠的区域。由于播种算法最初是为了提高局部比对的灵敏度而提出的,因此在寡核苷酸设计问题中,不仅要考虑灵敏度,还要考虑特异性。然而,目前还没有提出一种衡量种子在寡核苷酸设计中充分性和效率的方法。在这里,我们提出了基于可区分性和效率的新的种子算法评估方法。
为了评估所提出的方法,我们在寡核苷酸设计中检验了五种播种算法。我们进行了一系列实验来比较播种算法。结果表明,间隔种子是最有效的用于寡核苷酸设计的区分性种子。转换约束种子的性能略低于间隔种子。由于 BLAT 播种算法和 Vector 播种算法在特异性和效率方面得分较低,我们得出结论,这些算法不适合设计寡核苷酸。因此,我们建议使用长度约为 50 个碱基的间隔种子或转换约束种子,权重为 15 到 18。在真实生物数据的实证实验中,推荐的种子表现出了良好的性能。我们还提出了一个软件包,使用户能够在自己的实验条件下获得合适的种子。
我们的研究有两个重要价值。一是我们的研究可以应用于寡核苷酸设计程序,通过建议特定于实验的种子来提高性能。另一个是我们的研究有助于提高下一代测序领域的映射组装的性能。我们提出的方法最初是为寡核苷酸设计而设计的,但我们希望我们的研究将对其他基因组任务有所帮助。