Suppr超能文献

使用正无标记机器学习识别mRNA 5'非翻译区序列中的潜在核糖开关元件。

Identification of potential riboswitch elements inmRNA 5'UTR sequences using Positive-Unlabeled machine learning.

作者信息

Raymond William S, DeRoo Jacob, Munsky Brian

机构信息

School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA.

Chemical and Biological Engineering, Colorado State University Fort Collins, CO 80523, USA.

出版信息

bioRxiv. 2024 Dec 6:2023.11.23.568398. doi: 10.1101/2023.11.23.568398.

Abstract

Riboswitches are a class of noncoding RNA structures that interact with target ligands to cause a conformational change that can then execute some regulatory purpose within the cell. Riboswitches are ubiquitous and well characterized in bacteria and prokaryotes, with additional examples also being found in fungi, plants, and yeast. To date, no purely RNA-small molecule riboswitch has been discovered in . Several analogous riboswitch-like mechanisms have been described within the translatome within the past decade, prompting the question: Is there a riboswitch dependent on only small molecule ligands? In this work, we set out to train positive unlabeled machine learning classifiers on known riboswitch sequences and apply the classifiers to mRNA 5'UTR sequences found in the 5'UTR database, UTRdb, in the hope of identifying a set of mRNAs to investigate for riboswitch functionality. 67,683 riboswitch sequences were obtained from RNAcentral and sorted for ligand type and used as positive examples and 48,031 5'UTR sequences were used as unlabeled, unknown examples. Positive examples were sorted by ligand, and 20 positive-unlabeled classifiers were trained on sequence and secondary structure features while withholding one or two ligand classes. Cross validation was then performed on the withheld ligand sets to obtain a validation accuracy range of 75%-99%. The joint sets of 5'UTRs identified as potential riboswitches by the 20 classifiers were then analyzed. 15333 sequences were identified as a riboswitch by one or more classifier(s) and 436 of the 5'UTRs were labeled as harboring potential riboswitch elements by all 20 classifiers. These 436 sequences were mapped back to the most similar riboswitches within the positive data and examined. An online database of identified and ranked 5'UTRs, their features, and their most similar matches to known riboswitches, is provided to guide future experimental efforts to identify riboswitches.

摘要

核糖开关是一类非编码RNA结构,它们与靶标配体相互作用,引起构象变化,进而在细胞内执行某种调控功能。核糖开关在细菌和原核生物中普遍存在且特征明确,在真菌、植物和酵母中也发现了其他例子。迄今为止,尚未在[具体物种]中发现纯RNA-小分子核糖开关。在过去十年中,在翻译组中描述了几种类似核糖开关的机制,这引发了一个问题:是否存在仅依赖小分子配体的核糖开关?在这项工作中,我们着手在已知核糖开关序列上训练正无标签机器学习分类器,并将这些分类器应用于5'UTR数据库UTRdb中发现的mRNA 5'UTR序列,希望识别出一组mRNA以研究其核糖开关功能。从RNAcentral获得了67683个核糖开关序列,并根据配体类型进行分类,用作正例,48031个5'UTR序列用作无标签的未知例。正例按配体分类,在保留一或两个配体类别的同时,基于序列和二级结构特征训练了20个正无标签分类器。然后对保留的配体集进行交叉验证,以获得75%-99%的验证准确率范围。接着分析了被20个分类器识别为潜在核糖开关的5'UTR联合集。15333个序列被一个或多个分类器识别为核糖开关,所有20个分类器将436个5'UTR标记为含有潜在核糖开关元件。将这436个序列映射回正数据中最相似的核糖开关并进行检查。提供了一个已识别和排名的5'UTR在线数据库,包括它们的特征以及与已知核糖开关的最相似匹配,以指导未来识别[具体物种]核糖开关的实验工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a6/11642740/2418ecf2df68/nihpp-2023.11.23.568398v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验