Suppr超能文献

利用序列衍生特征和谱系特异性方案预测生物发光蛋白。

Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme.

作者信息

Zhang Jian, Chai Haiting, Yang Guifu, Ma Zhiqiang

机构信息

School of Computer Science and Information Technology, Northeast Normal University, Changchun, Jilin Province, 130117, People's Republic of China.

School of Computer and Information Technology, Xinyang Normal University, Xinyang, Henan Province, 464000, People's Republic of China.

出版信息

BMC Bioinformatics. 2017 Jun 5;18(1):294. doi: 10.1186/s12859-017-1709-6.

Abstract

BACKGROUND

Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs.

RESULTS

We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches.

CONCLUSION

Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineage-specific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.

摘要

背景

生物发光蛋白(BLP)广泛存在于许多生物体中。由于BLP具有发光能力,它们可作为生物标志物,在生物医学研究中易于检测,如基因表达分析和信号转导通路研究。因此,准确识别BLP对于疾病诊断和生物医学工程至关重要。在本文中,我们提出了一种名为PredBLP(生物发光蛋白预测)的基于序列的新型准确方法来预测BLP。

结果

我们收集了一系列已被证明与BLP的结构和功能相关的序列衍生特征。这些特征包括氨基酸组成、二肽组成、序列基序和理化性质。我们进一步证明,这四种类型特征的组合优于任何其他组合或单个特征。为了去除潜在的不相关或冗余特征,我们还引入了Fisher马尔可夫选择器和顺序反向选择策略来选择最优特征子集。此外,我们设计了一种谱系特异性方案,事实证明该方案比传统的通用方法更有效。

结论

在基准数据集上的实验证明了PredBLP的稳健性。我们证明了谱系特异性模型明显优于通用模型。我们还基于独立测试数据集以及UniProt中新存入的BLP测试了PredBLP的泛化能力。事实证明,PredBLP能够超越许多现有方法。一个名为PredBLP的网络服务器实现了所提出的方法,可供学术免费使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f99b/5460367/f9183ee8a484/12859_2017_1709_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验