Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA.
Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA.
Genome Biol. 2023 Aug 28;24(1):197. doi: 10.1186/s13059-023-03033-5.
Synthetic long read sequencing techniques such as UST's TELL-Seq and Loop Genomics' LoopSeq combine 3[Formula: see text] barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3[Formula: see text] unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.
合成长读测序技术,如 UST 的 TELL-Seq 和 Loop Genomics 的 LoopSeq,将 3[Formula: see text]条形码与标准短读测序相结合,将连锁分辨率从数百个碱基对扩展到数万碱基对。然而,长片段与 3[Formula: see text]独特分子标识符之间缺乏一一对应关系,使得短读之间的连锁分配变得复杂。我们引入了 Ariadne,一种基于组装图的新型合成长读分解算法,可用于从合成长读数据集提取单物种读云,以提高复杂群体(如宏基因组)的分类学分类和从头组装。