Břinda Karel, Boeva Valentina, Kucherov Gregory
LIGM/CNRS, Université Paris-Est, 77454 Marne-la-Vallée, France.
Inserm, U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Biology of Cancer, 75248 Paris, France, Institut Curie, Centre de Recherche, 26 rue d'Ulm, 75248 Paris, France and Mines ParisTech, 77300 Fontainebleau, France.
Bioinformatics. 2016 Jan 1;32(1):136-9. doi: 10.1093/bioinformatics/btv524. Epub 2015 Sep 9.
Read simulators combined with alignment evaluation tools provide the most straightforward way to evaluate and compare mappers. Simulation of reads is accompanied by information about their positions in the source genome. This information is then used to evaluate alignments produced by the mapper. Finally, reports containing statistics of successful read alignments are created.In default of standards for encoding read origins, every evaluation tool has to be made explicitly compatible with the simulator used to generate reads.
To solve this obstacle, we have created a generic format Read Naming Format (Rnf) for assigning read names with encoded information about original positions. Futhermore, we have developed an associated software package RnfTools containing two principal components. MIShmash applies one of popular read simulating tools (among DwgSim, Art, Mason, CuReSim, etc.) and transforms the generated reads into Rnf format. LAVEnder evaluates then a given read mapper using simulated reads in Rnf format. A special attention is payed to mapping qualities that serve for parametrization of Roc curves, and to evaluation of the effect of read sample contamination.
RnfTools: http://karel-brinda.github.io/rnftools Spec. of Rnf: http://karel-brinda.github.io/rnf-spec
读取模拟器与比对评估工具相结合,为评估和比较比对器提供了最直接的方法。读取模拟伴随着关于它们在源基因组中位置的信息。然后利用这些信息来评估比对器产生的比对结果。最后,生成包含成功读取比对统计信息的报告。由于缺乏编码读取来源的标准,每个评估工具都必须与用于生成读取的模拟器明确兼容。
为了解决这一障碍,我们创建了一种通用格式“读取命名格式(Rnf)”,用于为读取分配带有原始位置编码信息的名称。此外,我们还开发了一个相关的软件包RnfTools,它包含两个主要组件。MIShmash应用一种流行的读取模拟工具(在DwgSim、Art、Mason、CuReSim等之中),并将生成的读取转换为Rnf格式。LAVEnder随后使用Rnf格式的模拟读取来评估给定的读取比对器。特别关注用于Roc曲线参数化的比对质量,以及读取样本污染的影响评估。
RnfTools:http://karel-brinda.github.io/rnftools;Rnf规范:http://karel-brinda.github.io/rnf-spec