Suppr超能文献

处理杂交测序中的重复问题。

Dealing with repetitions in sequencing by hybridization.

作者信息

Blazewicz Jacek, Glover Fred, Kasprzak Marta, Markiewicz Wojciech T, Oğuz Ceyda, Rebholz-Schuhmann Dietrich, Swiercz Aleksandra

机构信息

Institute of Computing Science, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland.

出版信息

Comput Biol Chem. 2006 Oct;30(5):313-20. doi: 10.1016/j.compbiolchem.2006.05.002. Epub 2006 Aug 30.

Abstract

DNA sequencing by hybridization (SBH) induces errors in the biochemical experiment. Some of them are random and disappear when the experiment is repeated. Others are systematic, involving repetitions in the probes of the target sequence. A good method for solving SBH problems must deal with both types of errors. In this work we propose a new hybrid genetic algorithm for isothermic and standard sequencing that incorporates the concept of structured combinations. The algorithm is then compared with other methods designed for handling errors that arise in standard and isothermic SBH approaches. DNA sequences used for testing are taken from GenBank. The set of instances for testing was divided into two groups. The first group consisted of sequences containing positive and negative errors in the spectrum, at a rate of up to 20%, excluding errors coming from repetitions. The second group consisted of sequences containing repeated oligonucleotides, and containing additional errors up to 5% added into the spectra. Our new method outperforms the best alternative procedures for both data sets. Moreover, the method produces solutions exhibiting extremely high degree of similarity to the target sequences in the cases without repetitions, which is an important outcome for biologists. The spectra prepared from the sequences taken from GenBank are available on our website http://bio.cs.put.poznan.pl/.

摘要

杂交测序(SBH)在生化实验中会产生错误。其中一些是随机的,在重复实验时会消失。另一些是系统性的,涉及目标序列探针中的重复。解决SBH问题的好方法必须处理这两种类型的错误。在这项工作中,我们提出了一种用于等温及标准测序的新型混合遗传算法,该算法纳入了结构化组合的概念。然后将该算法与为处理标准和等温SBH方法中出现的错误而设计的其他方法进行比较。用于测试的DNA序列取自GenBank。测试实例集分为两组。第一组由光谱中包含正负错误的序列组成,错误率高达20%,不包括来自重复的错误。第二组由包含重复寡核苷酸的序列组成,并且在光谱中添加了高达5%的额外错误。对于这两个数据集,我们的新方法都优于最佳替代程序。此外,在没有重复的情况下,该方法产生的解决方案与目标序列具有极高的相似度,这对生物学家来说是一个重要成果。从GenBank获取的序列制备的光谱可在我们的网站http://bio.cs.put.poznan.pl/上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验