Bioinformatics core at Masonic Cancer Center, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA.
Nucleic Acids Res. 2012 Oct;40(19):e146. doi: 10.1093/nar/gks615. Epub 2012 Jun 26.
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.
了解人类疾病的分类对于可靠地识别疾病相关基因至关重要。最近,与疾病相关的异常染色体位置的全基因组研究已经绘制了超过 2000 个表型-基因关系,为疾病分类和识别候选基因作为药物靶点提供了有价值的信息。在本文中,引入了一种正则化非负矩阵三因子分解(R-NMTF)算法,用于共同聚类表型和基因,并同时检测检测到的表型簇和基因簇之间的关联。R-NMTF 算法根据来自表型相似性网络和蛋白质-蛋白质相互作用网络的先验知识,以及来自已知疾病类别和生物途径的标签信息,对表型-基因关联矩阵进行因子分解。在 OMIM 和 KEGG 疾病途径中的疾病表型-基因关联的实验中,R-NMTF 在交叉验证中对注释的表型和基因进行分类时,与支持向量机和标签传播相比,显著提高了疾病表型和疾病途径基因的分类。在每个疾病类别中预测的新表型与人类表型本体论注释高度一致。在蛋白质-蛋白质相互作用子网络中检查和验证了疾病途径中新成员基因的作用。广泛的文献综述还证实了许多疾病类别和途径的新成员,以及疾病表型类别和途径之间预测的关联。