Yun Zheng, Keong Kwoh Chee
BIRC, School of Comp. Eng., Nanyang Technological University, Singapore.
Proc IEEE Comput Syst Bioinform Conf. 2004:353-62. doi: 10.1109/csb.2004.1332448.
It is still an open problem to identify functional relations with o(N . n(k)) time for any domain[2], where N is the number of learning instances, n is the number of genes (or variables) in the Gene Regulatory Network (GRN) models and k is the indegree of the genes. To solve the problem, we introduce a novel algorithm, DFL (Discrete Function Learning), for reconstructing qualitative models of GRNs from gene expression data in this paper. We analyze its complexity of O(k . N . n(2)) on the average and its data requirements. We also perform experiments on both synthetic and Cho et al. [7] yeast cell cycle gene expression data to validate the efficiency and prediction performance of the DFL algorithm. The experiments of synthetic Boolean networks show that the DFL algorithm is more efficient than current algorithms without loss of prediction performances. The results of yeast cell cycle gene expression data show that the DFL algorithm can identify biologically significant models with reasonable accuracy, sensitivity and high precision with respect to the literature evidences. We further introduce a method called epsilon function to deal with noises in data sets. The experimental results show that the epsilon function method is a good supplement to the DFL algorithm.
对于任何领域,在O(N·n(k))时间内识别函数关系仍然是一个开放问题[2],其中N是学习实例的数量,n是基因调控网络(GRN)模型中的基因(或变量)数量,k是基因的入度。为了解决这个问题,我们在本文中引入了一种新算法DFL(离散函数学习),用于从基因表达数据重建GRN的定性模型。我们分析了其平均复杂度为O(k·N·n(2))及其数据需求。我们还对合成数据和Cho等人[7]的酵母细胞周期基因表达数据进行了实验,以验证DFL算法的效率和预测性能。合成布尔网络的实验表明,DFL算法在不损失预测性能的情况下比当前算法更有效。酵母细胞周期基因表达数据的结果表明,DFL算法能够根据文献证据以合理的准确性、敏感性和高精度识别具有生物学意义的模型。我们进一步引入了一种称为ε函数的方法来处理数据集中的噪声。实验结果表明,ε函数方法是对DFL算法的良好补充。