Chuang Cheng-Long, Jen Chih-Hung, Chen Chung-Ming, Shieh Grace S
Institute of Biomedical Engineering, National Taiwan University, Taipei 106, Taiwan.
Bioinformatics. 2008 May 1;24(9):1183-90. doi: 10.1093/bioinformatics/btn098. Epub 2008 Mar 12.
For any time-course microarray data in which the gene interactions and the associated paired patterns are dependent, the proposed pattern recognition (PARE) approach can infer time-lagged genetic interactions, a challenging task due to the small number of time points and large number of genes. PARE utilizes a non-linear score to identify subclasses of gene pairs with different time lags. In each subclass, PARE extracts non-linear characteristics of paired gene-expression curves and learns weights of the decision score applying an optimization algorithm to microarray gene-expression data (MGED) of some known interactions, from biological experiments or published literature. Namely, PARE integrates both MGED and existing knowledge via machine learning, and subsequently predicts the other genetic interactions in the subclass.
PARE, a time-lagged correlation approach and the latest advance in graphical Gaussian models were applied to predict 112 (132) pairs of TC/TD (transcriptional regulatory) interactions. Checked against qRT-PCR results (published literature), their true positive rates are 73% (77%), 46% (51%), and 52% (59%), respectively. The false positive rates of predicting TC and TD (AT and RT) interactions in the yeast genome are bounded by 13 and 10% (10 and 14%), respectively. Several predicted TC/TD interactions are shown to coincide with existing pathways involving Sgs1, Srs2 and Mus81. This reinforces the possibility of applying genetic interactions to predict pathways of protein complexes. Moreover, some experimentally testable gene interactions involving DNA repair are predicted.
Supplementary data and PARE software are available at http://www.stat.sinica.edu.tw/~gshieh/pare.htm.
对于任何基因相互作用和相关配对模式相互依赖的时间进程微阵列数据,所提出的模式识别(PARE)方法可以推断时间滞后的基因相互作用,这是一项具有挑战性的任务,因为时间点数量少而基因数量多。PARE利用非线性得分来识别具有不同时间滞后的基因对子类。在每个子类中,PARE提取配对基因表达曲线的非线性特征,并通过对来自生物学实验或已发表文献的一些已知相互作用的微阵列基因表达数据(MGED)应用优化算法来学习决策得分的权重。也就是说,PARE通过机器学习整合MGED和现有知识,随后预测该子类中的其他基因相互作用。
PARE、一种时间滞后相关方法以及图形高斯模型的最新进展被用于预测112(132)对转录调控(TC/TD)相互作用。对照qRT-PCR结果(已发表文献)进行检验,它们的真阳性率分别为73%(77%)、46%(51%)和52%(59%)。在酵母基因组中预测TC和TD(AT和RT)相互作用的假阳性率分别限制在13%和10%(10%和14%)以内。几个预测的TC/TD相互作用显示与涉及Sgs1、Srs2和Mus81的现有途径一致。这加强了应用基因相互作用来预测蛋白质复合物途径的可能性。此外,还预测了一些涉及DNA修复的可通过实验检验的基因相互作用。
补充数据和PARE软件可在http://www.stat.sinica.edu.tw/~gshieh/pare.htm获取。