School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel School of Computer Science, Hadassah Academic College, Jerusalem, Israel.
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W182-6. doi: 10.1093/nar/gku363. Epub 2014 May 3.
Neuropeptides (NPs) are short secreted peptides produced in neurons. NPs act by activating signaling cascades governing broad functions such as metabolism, sensation and behavior throughout the animal kingdom. NPs are the products of multistep processing of longer proteins, the NP precursors (NPPs). We present NeuroPID (Neuropeptide Precursor Identifier), an online machine-learning tool that identifies metazoan NPPs. NeuroPID was trained on 1418 NPPs annotated as such by UniProtKB. A large number of sequence-based features were extracted for each sequence with the goal of capturing the biophysical and informational-statistical properties that distinguish NPPs from other proteins. Training several machine-learning models, including support vector machines and ensemble decision trees, led to high accuracy (89-94%) and precision (90-93%) in cross-validation tests. For inputs of thousands of unseen sequences, the tool provides a ranked list of high quality predictions based on the results of four machine-learning classifiers. The output reveals many uncharacterized NPPs and secreted cell modulators that are rich in potential cleavage sites. NeuroPID is a discovery and a prediction tool that can be used to identify NPPs from unannotated transcriptomes and mass spectrometry experiments. NeuroPID predicted sequences are attractive targets for investigating behavior, physiology and cell modulation. The NeuroPID web tool is available at http:// neuropid.cs.huji.ac.il.
神经肽(NPs)是在神经元中产生的短分泌肽。NPs 通过激活信号级联反应来发挥作用,这些信号级联反应控制着代谢、感觉和行为等广泛的功能,遍及动物界。NPs 是更长蛋白质的多步加工产物,即神经肽前体(NPPs)。我们介绍了 NeuroPID(神经肽前体识别器),这是一种在线机器学习工具,可识别后生动物的 NPPs。NeuroPID 是在 UniProtKB 注释为 NPP 的 1418 种 NPP 上进行训练的。为每个序列提取了大量基于序列的特征,目的是捕获将 NPP 与其他蛋白质区分开来的生物物理和信息统计特性。通过训练多个机器学习模型,包括支持向量机和集成决策树,在交叉验证测试中实现了高准确性(89-94%)和高精度(90-93%)。对于数千个未见过的序列输入,该工具会根据四个机器学习分类器的结果提供高质量预测的排名列表。输出揭示了许多未表征的 NPP 和富含潜在切割位点的分泌细胞调节剂。NeuroPID 是一种发现和预测工具,可用于从未注释的转录组和质谱实验中识别 NPPs。NeuroPID 预测的序列是研究行为、生理学和细胞调节的有吸引力的目标。NeuroPID 网络工具可在 http://neuropid.cs.huji.ac.il 获得。