Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan.
Interdiscip Sci. 2010 Sep;2(3):263-70. doi: 10.1007/s12539-010-0023-z. Epub 2010 Jul 25.
The prediction of non-classical secreted proteins is a significant problem for drug discovery and development of disease diagnosis. The characteristic of non-classical secreted proteins is they are leaderless proteins without signal peptides in N-terminal. This characteristic makes the prediction of non-classical proteins more difficult and complicated than the classical secreted proteins. We identify a set of informative physicochemical properties of amino acid indices cooperated with support vector machine (SVM) to find discrimination between secreted and non-secreted proteins and to predict non-classical secreted proteins. When the sequence identity of dataset was reduced to 25%, the prediction accuracy on training dataset is 85% which is much better than the traditional sequence similarity-based BLAST or PSI-BLAST tool. The accuracy of independent test is 82%. The most effective features of prediction revealed the fundamental differences of physicochemical properties between secreted and non-secreted proteins. The interpretable and valuable information could be beneficial for drug discovery or the development of new blood biochemical examinations.
非经典分泌蛋白的预测是药物发现和疾病诊断发展的一个重大问题。非经典分泌蛋白的特征是它们没有 N 端信号肽的无领导蛋白。这一特征使得非经典蛋白的预测比经典分泌蛋白更困难和复杂。我们确定了一组信息丰富的氨基酸指数理化性质,与支持向量机(SVM)相结合,以发现分泌蛋白和非分泌蛋白之间的区别,并预测非经典分泌蛋白。当数据集的序列同一性降低到 25%时,在训练数据集上的预测准确性为 85%,这明显优于传统的基于序列相似性的 BLAST 或 PSI-BLAST 工具。独立测试的准确性为 82%。预测最有效的特征揭示了分泌蛋白和非分泌蛋白理化性质的根本差异。可解释和有价值的信息可有益于药物发现或新的血液生化检查的发展。