Suppr超能文献

BLProt:基于支持向量机和 ReliefF 特征选择的生物发光蛋白预测。

BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection.

机构信息

Institute for Neuro- and Bioinformatics, University of Lübeck, 23538 Lübeck, Germany.

出版信息

BMC Bioinformatics. 2011 Aug 17;12:345. doi: 10.1186/1471-2105-12-345.

Abstract

BACKGROUND

Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.

RESULTS

In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.

CONCLUSION

BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt.

摘要

背景

生物发光是一种生物体发光的过程。大多数发光的生物都是海洋生物,但也有一些昆虫、植物、真菌等也会发光。生物发光的生物技术应用已经成为常规,被认为是许多医学和一般技术进步的关键。由于生物发光蛋白在序列上的相似性较差,因此鉴定生物发光蛋白更具挑战性。到目前为止,还没有报道从原始序列中鉴定生物发光蛋白的特定方法。

结果

在本文中,我们提出了一种使用支持向量机(SVM)和物理化学性质来预测生物发光蛋白的新预测方法。BLProt 使用由 300 种生物发光蛋白和 300 种非生物发光蛋白组成的数据集进行训练,并通过 141 种生物发光蛋白和 18202 种非生物发光蛋白的独立数据集进行评估。为了确定最突出的特征,我们使用三种不同的过滤方法(ReliefF、infogain 和 mRMR)进行特征选择。我们通过减少特征数量选择了五个不同的特征子集,并评估了每个特征子集的性能。

结论

BLProt 在训练(5 倍交叉验证)中达到 80%的准确率,在测试中达到 80.06%的准确率。BLProt 的性能与 BLAST 和 HMM 进行了比较。高预测准确率和对假设蛋白的成功预测表明,BLProt 可以作为一种从序列信息中识别生物发光蛋白的有用方法,而无需考虑它们的序列相似性。BLProt 软件可在 http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82b0/3176267/e556ff5b1e77/1471-2105-12-345-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验