Ewing B, Green P
Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195-7730, USA.
Genome Res. 1998 Mar;8(3):186-94.
Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.
消除高通量测序中的数据处理瓶颈既需要提高数据处理软件的准确性,也需要对这种准确性进行可靠的测量。我们已经在碱基识别程序phred中开发并实现了根据从测序峰图数据计算出的某些参数来估计每个碱基识别错误概率的功能。对于在几种不同化学方法和电泳条件下收集的读取数据,这些错误概率在这里显示是有效的(与实际错误率相对应),并且有很强的能力区分正确和错误的碱基识别。它们在我们的序列拼接程序phrap和完成程序consed中起着关键作用。