Preparata Franco P
Computer Science Department, Brown University, Providence, RI 02912, USA.
J Comput Biol. 2013 Jun;20(6):424-32. doi: 10.1089/cmb.2011.0243. Epub 2013 May 15.
This work revisits the classic problem of coverage in genomic shotgun assembly (the "Lander-Waterman statistics"). A novel formulation, based on the analysis of an autonomous Markov automaton, is presented, and two main conclusions are derived. The first is an evaluation of the minimum multiplicity ("coverage") required to achieve uninterrupted covering (one single contig) with a prescribed confidence level. The second is a detailed analysis of the effect of replacing the hypothesis of fixed-length genomic fragments with that of an arbitrary distribution of lengths over a finite interval.