Suppr超能文献

用于区分长链非编码RNA和信使RNA的组成、理化性质及碱基周期性

Composition, physicochemical property and base periodicity for discriminating lncRNA and mRNA.

作者信息

Rajesh Prasad, Krishnamachari Annangarachari

机构信息

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.

出版信息

Bioinformation. 2023 Dec 31;19(12):1145-1152. doi: 10.6026/973206300191145. eCollection 2023.

Abstract

Annotation of genome data with biological features is a challenging problem. One such problem deals with distinguishing lncRNA from mRNA. In this study, three groups of classification features, namely base periodicity, physicochemical property and nucleotide compositions were considered. We are attempting to propose a simple neural network model to obtain better results using judicious combination of the above said sequence features. Our approach uses balanced dataset, simple prediction model and use of limited features in distinguishing lncRNA and mRNA. Accordingly (a) two properties of base periodicity: peak power spectrum of the signal and noise-to-signal ratio (SNR) of this peak signal (b) three physicochemical properties: solvation, stacking and hydrogen-bonding energy and (c) all dinucleotides and trinucleotides compositions were used. Classification was performed by considering features independently followed by combining these properties for improvement. Classification metric was used to compare the result for seven eukaryotic organisms for various combinations of features. Nucleotide compositions combined with physicochemical property or base periodicity group of features becomes a strong classifier with more than 99 percentage accuracy. Base periodicity analysis with SNR can be used as discriminating feature of lncRNA from mRNA.

摘要

用生物学特征对基因组数据进行注释是一个具有挑战性的问题。其中一个问题是区分长链非编码RNA(lncRNA)和信使核糖核酸(mRNA)。在本研究中,考虑了三组分类特征,即碱基周期性、物理化学性质和核苷酸组成。我们试图提出一个简单的神经网络模型,通过明智地组合上述序列特征来获得更好的结果。我们的方法使用平衡数据集、简单的预测模型,并在区分lncRNA和mRNA时使用有限的特征。相应地,(a)碱基周期性的两个属性:信号的峰值功率谱和该峰值信号的信噪比(SNR);(b)三个物理化学性质:溶剂化、堆积和氢键能;以及(c)所有二核苷酸和三核苷酸组成都被使用。通过独立考虑特征,然后组合这些属性以进行改进来进行分类。使用分类指标来比较七种真核生物在各种特征组合下的结果。核苷酸组成与物理化学性质或碱基周期性特征组相结合,成为一个准确率超过99%的强大分类器。基于信噪比的碱基周期性分析可作为区分lncRNA和mRNA的判别特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779e/10794758/bde561a05cc5/973206300191145F1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验