Schneeberger Korbinian, Malde Ketil, Coward Eivind, Jonassen Inge
Computational Biology Unit, University of Bergen Bergen, Norway.
Nucleic Acids Res. 2005 Apr 14;33(7):2176-80. doi: 10.1093/nar/gki511. Print 2005.
A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. Unlike traditional methods, repeats are inferred directly from the EST data, we do not rely on any external library of known repeats. This makes the method especially suitable for analysing the ESTs from organisms without good repeat libraries. We demonstrate that the result is very similar to performing standard repeat masking before clustering.
EST聚类中的一个问题是存在重复序列。为避免错误匹配,必须对重复序列进行屏蔽。这可能是一个耗时的过程,并且取决于可用的重复序列库。我们提出了一种快速有效的方法,旨在消除重复序列在聚类过程中引起的问题。与传统方法不同,重复序列是直接从EST数据中推断出来的,我们不依赖任何已知重复序列的外部库。这使得该方法特别适用于分析来自没有良好重复序列库的生物体的EST。我们证明,结果与在聚类前进行标准重复序列屏蔽非常相似。