Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China.
University of Chinese Academy of Sciences, Beijing, China.
Bioinformatics. 2018 Dec 1;34(23):3966-3974. doi: 10.1093/bioinformatics/bty456.
The launch of the BioNano next-generation mapping system has greatly enhanced the performance of physical map construction, thus rapidly expanding the application of optical mapping in genome research. Data biases have profound implications for downstream applications. However, very little is known about the properties and biases of BioNano data, and the very factors that contribute to whole-genome optical map assembly.
We generated BioNano molecule data from eight organisms with diverse base compositions. We first characterized the properties/biases of BioNano molecule data, i.e. molecule length distribution, false labelling signal, variation of optical resolution and coverage distribution bias, and their inducing factors such as chimeric molecules, fragile sites and DNA molecule stretching. Second, we developed the BioNano Molecule SIMulator (BMSIM), a novel computer simulation program for optical data. BMSIM, is of great use for future genome mapping projects. Third, we evaluated the experimental variables that impact whole-genome optical map assembly. Specifically, the effects of coverage depth, molecule length, false-positive and false-negative labelling signals, chimeric molecules and nicking enzyme and nick site density were investigated. Our simulation study provides the empirical findings on how to control experimental variables and gauge analytical parameters to maximize benefit and minimize cost on whole-genome optical map assembly.
BMSIM is freely available on: https://github.com/pingchen09990102/BMSIM.
Supplementary data are available at Bioinformatics online.
下一代生物纳米图谱系统的推出极大地提高了物理图谱构建的性能,从而迅速扩展了光学图谱在基因组研究中的应用。数据偏差对下游应用有深远的影响。然而,人们对生物纳米数据的性质和偏差以及导致全基因组光学图谱组装的因素知之甚少。
我们从具有不同碱基组成的八个生物体中生成了生物纳米分子数据。我们首先描述了生物纳米分子数据的特性/偏差,即分子长度分布、假标记信号、光学分辨率和覆盖度分布偏差的变化,以及嵌合分子、脆弱位点和 DNA 分子拉伸等诱导因素。其次,我们开发了生物纳米分子 SIMulator(BMSIM),这是一种用于光学数据的新型计算机模拟程序。BMSIM 对未来的基因组图谱绘制项目非常有用。第三,我们评估了影响全基因组光学图谱组装的实验变量。具体来说,研究了覆盖深度、分子长度、假阳性和假阴性标记信号、嵌合分子以及切口酶和切口位点密度对全基因组光学图谱组装的影响。我们的模拟研究提供了如何控制实验变量和评估分析参数以最大化全基因组光学图谱组装的收益并最小化成本的经验发现。
BMSIM 可在以下网址免费获取:https://github.com/pingchen09990102/BMSIM。
补充数据可在生物信息学在线获取。