Eickholt Jesse, Wang Zheng
Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859, USA.
BMC Res Notes. 2014 Nov 18;7:810. doi: 10.1186/1756-0500-7-810.
Machine Learning (ML) has a number of demonstrated applications in protein prediction tasks such as protein structure prediction. To speed further development of machine learning based tools and their release to the community, we have developed a package which characterizes several aspects of a protein commonly used for protein prediction tasks with machine learning.
A number of software libraries and modules exist for handling protein related data. The package we present in this work, PCP-ML, is unique in its small footprint and emphasis on machine learning. Its primary focus is on characterizing various aspects of a protein through sets of numerical data. The generated data can then be used with machine learning tools and/or techniques. PCP-ML is very flexible in how the generated data is formatted and as a result is compatible with a variety of existing machine learning packages. Given its small size, it can be directly packaged and distributed with community developed tools for protein prediction tasks.
Source code and example programs are available under a BSD license at http://mlid.cps.cmich.edu/eickh1jl/tools/PCPML/. The package is implemented in C++ and accessible as a Python module.
机器学习(ML)在蛋白质预测任务(如蛋白质结构预测)中已有许多已证实的应用。为了加速基于机器学习的工具的进一步开发并将其发布给社区,我们开发了一个软件包,该软件包可表征蛋白质的多个方面,这些方面常用于机器学习的蛋白质预测任务。
存在许多用于处理蛋白质相关数据的软件库和模块。我们在这项工作中展示的软件包PCP-ML,其独特之处在于占用空间小且侧重于机器学习。它的主要重点是通过数值数据集来表征蛋白质的各个方面。然后,生成的数据可与机器学习工具和/或技术一起使用。PCP-ML在生成数据的格式化方式上非常灵活,因此与各种现有的机器学习软件包兼容。鉴于其规模小,可以直接与社区开发的蛋白质预测任务工具一起打包和分发。
源代码和示例程序可在http://mlid.cps.cmich.edu/eickh1jl/tools/PCPML/ 以BSD许可获取。该软件包用C++实现,并可作为Python模块访问。