Cole T Jeffrey, Brewer Michael S
Department of Biology, East Carolina University, Greenville, NC, United States of America.
PeerJ. 2019 Jun 28;7:e7200. doi: 10.7717/peerj.7200. eCollection 2019.
In the era of Next-Generation Sequencing and shotgun proteomics, the sequences of animal toxigenic proteins are being generated at rates exceeding the pace of traditional means for empirical toxicity verification. To facilitate the automation of toxin identification from protein sequences, we trained Recurrent Neural Networks with Gated Recurrent Units on publicly available datasets. The resulting models are available via the novel software package TOXIFY, allowing users to infer the probability of a given protein sequence being a venom protein. TOXIFY is more than 20X faster and uses over an order of magnitude less memory than previously published methods. Additionally, TOXIFY is more accurate, precise, and sensitive at classifying venom proteins.
在下一代测序和鸟枪法蛋白质组学时代,动物产毒蛋白的序列生成速度超过了传统经验毒性验证方法的速度。为了促进从蛋白质序列中自动识别毒素,我们使用门控循环单元在公开可用的数据集上训练了循环神经网络。通过新颖的软件包TOXIFY可以获得生成的模型,用户可以据此推断给定蛋白质序列是毒液蛋白的概率。与之前发表的方法相比,TOXIFY的速度快20多倍,内存使用量减少了一个数量级以上。此外,TOXIFY在毒液蛋白分类方面更准确、精确且灵敏。