Otu Hasan H, Sayood Khalid
University of Nebraska-Lincoln, Department of Electrical Engineering, 209N WSEC, 68503, USA.
Bioinformatics. 2003 Jan;19(1):22-9. doi: 10.1093/bioinformatics/19.1.22.
One of the major problems in DNA sequencing is assembling the fragments obtained by shotgun sequencing. Most existing fragment assembly techniques follow the overlap-layout-consensus approach. This framework requires extensive computation in each phase and becomes inefficient with increasing number of fragments.
We propose a new algorithm which solves the overlap, layout, and consensus phases simultaneously. The fragments are clustered with respect to their Average Mutual Information (AMI) profiles using the k-means algorithm. This removes the unnecessary burden of considering the collection of fragments as a whole. Instead, the orientation and overlap detection are solved efficiently, within the clusters. The algorithm has successfully reconstructed both artificial and real data.
Available on request from the authors.
DNA测序中的一个主要问题是组装通过鸟枪法测序获得的片段。大多数现有的片段组装技术都遵循重叠-布局-共识方法。该框架在每个阶段都需要大量计算,并且随着片段数量的增加而变得效率低下。
我们提出了一种新算法,该算法同时解决重叠、布局和共识阶段。使用k均值算法根据片段的平均互信息(AMI)谱对片段进行聚类。这消除了将片段集合作为一个整体考虑的不必要负担。相反,在聚类中有效地解决了方向和重叠检测问题。该算法已成功重建了人工数据和真实数据。
可根据作者要求提供。