Simeon Saw, Shoombuatong Watshara, Anuwongcharoen Nuttapat, Preeyanon Likit, Prachayasittikul Virapong, Wikberg Jarl E S, Nantasenamat Chanin
Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700 Thailand.
Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok, 10700 Thailand.
J Cheminform. 2016 Dec 20;8:72. doi: 10.1186/s13321-016-0185-8. eCollection 2016.
Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from the amino acid sequence.
After data curation, an exhaustive data set consisting of 397 non-redundant FP oligomeric states was compiled from the literature. Results from benchmarking of the protein descriptors revealed that the model built with amino acid composition descriptors was the top performing model with accuracy, sensitivity and specificity in excess of 80% and MCC greater than 0.6 for all three data subsets (e.g. training, tenfold cross-validation and external sets). The model provided insights on the important residues governing the oligomerization of FP. To maximize the benefit of the generated predictive model, it was implemented as a web server under the R programming environment.
osFP affords a user-friendly interface that can be used to predict the oligomeric state of FP using the protein sequence. The advantage of osFP is that it is platform-independent meaning that it can be accessed via a web browser on any operating system and device. osFP is freely accessible at http://codes.bio/osfp/ while the source code and data set is provided on GitHub at https://github.com/chaninn/osFP/.Graphical Abstract.
目前,单体荧光蛋白(FP)是蛋白质标记的理想标志物。寡聚状态的预测有助于增强活体生物医学成像。FP寡聚状态的计算预测可以加速创建单体FP的蛋白质工程工作。据我们所知,本研究代表了第一个直接从氨基酸序列预测和分析FP寡聚化的计算模型。
经过数据整理,从文献中汇编了一个由397个非冗余FP寡聚状态组成的详尽数据集。蛋白质描述符的基准测试结果表明,用氨基酸组成描述符构建的模型是性能最佳的模型,对于所有三个数据子集(例如训练集、十折交叉验证集和外部集),其准确率、灵敏度和特异性均超过80%,马修斯相关系数大于0.6。该模型提供了关于控制FP寡聚化的重要残基的见解。为了最大限度地发挥生成的预测模型的作用,它在R编程环境下作为一个网络服务器实现。
osFP提供了一个用户友好的界面,可用于使用蛋白质序列预测FP的寡聚状态。osFP的优点是它与平台无关,这意味着它可以通过任何操作系统和设备上的网络浏览器访问。可在http://codes.bio/osfp/免费访问osFP,而源代码和数据集可在GitHub上的https://github.com/chaninn/osFP/获取。图形摘要。