Li Yifeng, Ngom Alioune
School of Computer Science, University of Windsor, Windsor, Ontario, Canada.
Source Code Biol Med. 2013 Apr 16;8(1):10. doi: 10.1186/1751-0473-8-10.
Non-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.
We provide a convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data. Data mining approaches implemented within the toolbox include data clustering and bi-clustering, feature extraction and selection, sample classification, missing values imputation, data visualization, and statistical comparison.
A series of analysis such as molecular pattern discovery, biological process identification, dimension reduction, disease prediction, visualization, and statistical comparison can be performed using this toolbox.
非负矩阵分解(NMF)已作为挖掘生物数据的一种重要方法被引入。尽管目前存在用R和其他编程语言实现的软件包,但它们要么只提供少数几种优化算法,要么专注于特定的应用领域。生物信息学领域不存在一个完整的NMF软件包,以便对生物数据执行各种数据挖掘任务。
我们提供了一个便捷的MATLAB工具箱,它既包含各种NMF技术的实现,也包含多种基于NMF的用于分析生物数据的数据挖掘方法。该工具箱中实现的数据挖掘方法包括数据聚类和双聚类、特征提取与选择、样本分类、缺失值插补、数据可视化以及统计比较。
使用这个工具箱可以进行一系列分析,如分子模式发现、生物过程识别、降维、疾病预测、可视化以及统计比较。