Schmal Matthias, Girod Crystal, Yaver Debbie, Mach Robert L, Mach-Aigner Astrid R
Christian Doppler laboratory for optimized expression of carbohydrate-active enzymes, Institute of Chemical, Environmental and Bioscience Engineering, TU Wien, Gumpendorfer Str. 1A, Vienna A-1060, Austria.
Production Strain Technology, Novozymes Inc., California, Davis, USA.
NAR Genom Bioinform. 2022 Aug 15;4(3):lqac059. doi: 10.1093/nargab/lqac059. eCollection 2022 Sep.
With the upcoming of affordable Next-Generation Sequencing technologies, the number of known non-protein coding RNAs increased drastically in recent years. Different types of non-coding RNAs (ncRNAs) emerged as key players in the regulation of gene expression on the RNA-RNA, RNA-DNA as well as RNA-protein level, ranging from involvement in chromatin remodeling and transcription regulation to post-transcriptional modifications. Prediction of ncRNAs involves the use of several bioinformatics tools and can be a daunting task for researchers. This led to the development of analysis pipelines such as UClncR and lncpipe. However, these pipelines are limited to datasets from human, mouse, zebrafish or fruit fly and are not able to analyze RNA sequencing data from other organisms. In this study, we developed the analysis pipeline Pinc (Pipeline for prediction of ncRNA) as an enhanced tool to predict ncRNAs based on sequencing data by removing transcripts that show protein-coding potential. Additionally, a feature for differential expression analysis of annotated genes as well as for identification of novel ncRNAs is implemented. Pinc uses Nextflow as a framework and is built with robust and well-established analysis tools. This will allow researchers to utilize sequencing data from every organism in order to reliably identify ncRNAs.
随着价格亲民的新一代测序技术的出现,近年来已知的非蛋白质编码RNA的数量急剧增加。不同类型的非编码RNA(ncRNA)在RNA-RNA、RNA-DNA以及RNA-蛋白质水平的基因表达调控中成为关键角色,其作用范围从参与染色质重塑和转录调控到转录后修饰。ncRNA的预测涉及使用多种生物信息学工具,对研究人员来说可能是一项艰巨的任务。这导致了诸如UClncR和lncpipe等分析流程的开发。然而,这些流程仅限于来自人类、小鼠、斑马鱼或果蝇的数据集,无法分析来自其他生物体的RNA测序数据。在本研究中,我们开发了分析流程Pinc(非编码RNA预测流程),作为一种增强工具,通过去除具有蛋白质编码潜力的转录本,基于测序数据预测ncRNA。此外,还实现了一个用于注释基因差异表达分析以及鉴定新型ncRNA的功能。Pinc以Nextflow作为框架,并使用强大且成熟的分析工具构建。这将使研究人员能够利用来自任何生物体的测序数据,以可靠地鉴定ncRNA。