Kostetskiĭ P V, Dobrova I E
Bioorg Khim. 1988 Apr;14(4):515-21.
An algorithm for reconstructing long DNA sequences, i.e. arranging all overlapping gel readings in the contigs, and the corresponding BASIC programme for personal computer "Iskra-226" (USSR) are described. The contig construction begins with the search for all fragments overlapping the basic (longest) one follower by determination of coordinates of 5' ends of the overlapping fragments. Then the gel reading with minimal 5' end coordinate and the gel reading with maximal 3' end coordinate are selected and used as basic ones at the next assembly steps. The procedure is finished when no gel reading overlapping the basic one can be found. All gel readings entered the contig are ignored at the next steps of the assembly. Finally, one or several contigs consisted of DNA fragments are obtained. Effectiveness of the algorithm was tested on a model based on the multiple assembly of the nucleotide sequence, encoding the Na, K-ATPase alpha-subunit of pig kidney. The programme does not call for user's participation and can comprise contigs up to 10,000 nucleotides long.
本文描述了一种用于重建长DNA序列的算法,即排列重叠群中所有重叠的凝胶读数,以及适用于苏联“Iskra - 226”个人计算机的相应BASIC程序。重叠群构建始于通过确定重叠片段5'端的坐标来搜索所有与基本(最长)片段重叠的片段。然后选择5'端坐标最小的凝胶读数和3'端坐标最大的凝胶读数,并在接下来的组装步骤中用作基本读数。当找不到与基本片段重叠的凝胶读数时,该过程结束。在组装的后续步骤中,进入重叠群的所有凝胶读数都将被忽略。最后,获得了由DNA片段组成的一个或几个重叠群。该算法的有效性在基于猪肾Na,K - ATP酶α亚基编码核苷酸序列多重组装的模型上进行了测试。该程序无需用户参与,可组装长达10,000个核苷酸的重叠群。