Schmidt Lukas, Werner Stephan, Kemmer Thomas, Niebler Stefan, Kristen Marco, Ayadi Lilia, Johe Patrick, Marchand Virginie, Schirmeister Tanja, Motorin Yuri, Hildebrandt Andreas, Schmidt Bertil, Helm Mark
Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University, Mainz, Germany.
Institute of Computer Science, Scientific Computing and Bioinformatics, Johannes Gutenberg-University, Mainz, Germany.
Front Genet. 2019 Sep 25;10:876. doi: 10.3389/fgene.2019.00876. eCollection 2019.
Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide resolution across the mapped transcriptome. Further downstream modules include tools for visualization, machine learning, and modification calling. From the machine-learning module, quality assessment parameters are provided to gauge the suitability of the initial dataset for effective machine learning and modification calling. This output is useful to improve the experimental parameters for library preparation and sequencing. In summary, the automation of the bioinformatics workflow allows a faster turnaround of the optimization cycles in modification calling.
从cDNA数据进行修饰映射已成为表观转录组学中一项极其重要的方法。cDNA中所谓的逆转录特征包含有关其致病RNA修饰的位置和性质的信息。因此,例如基于Illumina的高通量测序数据的数据挖掘在重要性方面迅速增长,而该领域仍缺乏有效的工具。在此,我们提出了一种基于机器学习的通用且用户友好的图形化工作流程系统,用于修饰识别。该工作流程始于一个用于修剪、映射和后处理的主要模块。后者包括以单核苷酸分辨率对映射转录组中的错配率和停滞率进行量化。更下游的模块包括用于可视化、机器学习和修饰识别的工具。从机器学习模块中,提供质量评估参数以评估初始数据集对于有效机器学习和修饰识别的适用性。此输出有助于改进文库制备和测序的实验参数。总之,生物信息学工作流程的自动化使得修饰识别中的优化周期周转更快。