一种用于DNA序列分析中碱基识别的自适应、面向对象策略。

An adaptive, object oriented strategy for base calling in DNA sequence analysis.

作者信息

Giddings M C, Brumley R L, Haker M, Smith L M

机构信息

Chemistry Department, University of Wisconsin, Madison 53706.

出版信息

Nucleic Acids Res. 1993 Sep 25;21(19):4530-40. doi: 10.1093/nar/21.19.4530.

DOI:10.1093/nar/21.19.4530

PMID:8233787

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC311186/

Abstract

An algorithm has been developed for the determination of nucleotide sequence from data produced in fluorescence-based automated DNA sequencing instruments employing the four-color strategy. This algorithm takes advantage of object oriented programming techniques for modularity and extensibility. The algorithm is adaptive in that data sets from a wide variety of instruments and sequencing conditions can be used with good results. Confidence values are provided on the base calls as an estimate of accuracy. The algorithm iteratively employs confidence determinations from several different modules, each of which examines a different feature of the data for accurate peak identification. Modules within this system can be added or removed for increased performance or for application to a different task. In comparisons with commercial software, the algorithm performed well.

摘要

已开发出一种算法，用于从采用四色策略的基于荧光的自动化DNA测序仪产生的数据中确定核苷酸序列。该算法利用面向对象编程技术实现模块化和可扩展性。该算法具有适应性，因为来自各种仪器和测序条件的数据集都能良好使用。碱基调用时会提供置信值，作为准确性的估计。该算法迭代地采用来自几个不同模块的置信度测定，每个模块都检查数据的不同特征以进行准确的峰识别。该系统中的模块可以添加或删除，以提高性能或应用于不同任务。与商业软件相比，该算法表现良好。