Suppr超能文献

预测蛋白质中二核苷酸特异性 RNA 结合位点。

Prediction of dinucleotide-specific RNA-binding sites in proteins.

机构信息

Kyushu Institute of Technology, Fukuoka, Japan.

出版信息

BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S5. doi: 10.1186/1471-2105-12-S13-S5. Epub 2011 Nov 30.

Abstract

BACKGROUND

Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide base composition, the dinucleotide is the smallest unit of RNA sequence information and many RNA-binding proteins simply bind to regions enriched in one dinucleotide. Interaction preferences of protein subsequences and dinucleotides can be inferred from protein-RNA complex structures, enabling a training-based prediction approach.

RESULTS

We analyzed basic statistics of amino acid-dinucleotide contacts in protein-RNA complexes and found their pairing preferences could be identified. Using a standard approach to represent protein subsequences by their evolutionary profile, we trained neural networks to predict multiclass target vectors corresponding to 16 possible contacting dinucleotide subsequences. In the cross-validation experiments, the accuracies of the optimum network, measured as areas under the curve (AUC) of the receiver operating characteristic (ROC) graphs, were in the range of 65-80%.

CONCLUSIONS

Dinucleotide-specific contact predictions have also been extended to the prediction of interacting protein and RNA fragment pairs, which shows the applicability of this method to predict targets of RNA-binding proteins. A web server predicting the 16-dimensional contact probability matrix directly from a user-defined protein sequence was implemented and made available at: http://tardis.nibio.go.jp/netasa/srcpred.

摘要

背景

许多病毒的基因表达、蛋白质合成、复制和组装的调控都涉及 RNA-蛋白质相互作用。虽然已经报道了一些成功的计算工具来识别蛋白质中的 RNA 结合位点,但特异性问题的研究仍不够充分。除了核苷酸碱基组成外,二核苷酸是 RNA 序列信息的最小单位,许多 RNA 结合蛋白简单地结合到富含一种二核苷酸的区域。可以从蛋白质-RNA 复合物结构中推断出蛋白质亚序列和二核苷酸的相互作用偏好,从而实现基于训练的预测方法。

结果

我们分析了蛋白质-RNA 复合物中氨基酸-二核苷酸接触的基本统计数据,发现可以识别它们的配对偏好。使用一种标准方法通过其进化轮廓来表示蛋白质亚序列,我们训练神经网络来预测对应于 16 种可能接触的二核苷酸亚序列的多类别目标向量。在交叉验证实验中,最优网络的精度,以接收者操作特征 (ROC) 图的曲线下面积 (AUC) 来衡量,在 65%-80%的范围内。

结论

还将二核苷酸特异性接触预测扩展到了 RNA 结合蛋白相互作用的蛋白质和 RNA 片段对的预测,这表明该方法适用于预测 RNA 结合蛋白的靶标。一个从用户定义的蛋白质序列直接预测 16 维接触概率矩阵的网络服务器已经实现,并可在以下网址获得:http://tardis.nibio.go.jp/netasa/srcpred。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8652/3278845/3721579f8085/1471-2105-12-S13-S5-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验