College of Cybersecurity, Sichuan University, Chengdu 610065, China.
School of Public Health, Southwest Medical University, Luzhou, Sichuan 646000, China.
J Chem Inf Model. 2020 Aug 24;60(8):3755-3764. doi: 10.1021/acs.jcim.0c00409. Epub 2020 Aug 5.
Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.
深度学习已被证明是一种强大的方法,可应用于包括图像、语言和生物医学数据在内的各个领域。由于有了 TensorFlow、PyTorch 和 Keras 等库和工具包,研究人员可以使用不同的深度学习架构和数据集进行快速建模。然而,这些工具包中使用的神经网络的现有实现通常是为特定的研究设计的,难以转移到其他工作中。在这里,我们提出了 autoBioSeqpy,这是一种使用深度学习进行生物序列分类的工具。该工具的优点是简单易用。用户只需准备输入数据集,然后使用命令行界面即可。然后,autoBioSeqpy 会自动执行一系列可定制的步骤,包括文本读取、参数初始化、序列编码、模型加载、训练和评估。此外,该工具还提供了各种可立即应用和适配的模型模板,以提高这些网络的可用性。我们介绍了 autoBioSeqpy 在三个生物序列问题上的应用:III 型分泌蛋白的预测、蛋白质亚细胞定位和 CRISPR/Cas9 sgRNA 活性。autoBioSeqpy 可在 https://github.com/jingry/autoBioSeqpy 上免费获得示例。