IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1250-1261. doi: 10.1109/TCBB.2018.2830357. Epub 2018 Apr 26.
Control of gene regulatory networks (GRNs) to shift gene expression from undesirable states to desirable ones has received much attention in recent years. Most of the existing methods assume that the cost of intervention at each state and time point, referred to as the immediate cost function, is fully known. In this paper, we employ the Partially-Observed Boolean Dynamical System (POBDS) signal model for a time sequence of noisy expression measurement from a Boolean GRN and develop a Bayesian Inverse Reinforcement Learning (BIRL) approach to address the realistic case in which the only available knowledge regarding the immediate cost function is provided by the sequence of measurements and interventions recorded in an experimental setting by an expert. The Boolean Kalman Smoother (BKS) algorithm is used for optimally mapping the available gene-expression data into a sequence of Boolean states, and then the BIRL method is efficiently combined with the Q-learning algorithm for quantification of the immediate cost function. The performance of the proposed methodology is investigated by applying a state-feedback controller to two GRN models: a melanoma WNT5A Boolean network and a p53-MDM2 negative feedback loop Boolean network, when the cost of the undesirable states, and thus the identity of the undesirable genes, is learned using the proposed methodology.
近年来,控制基因调控网络(GRNs)将基因表达从不理想状态转移到理想状态受到了广泛关注。大多数现有的方法假设干预每个状态和时间点的成本(称为即时成本函数)是完全已知的。在本文中,我们采用部分观测布尔动态系统(POBDS)信号模型对来自布尔 GRN 的噪声表达测量的时间序列进行建模,并开发了一种贝叶斯逆强化学习(BIRL)方法来解决即时成本函数的唯一可用知识是通过专家在实验环境中记录的测量和干预序列提供的实际情况。布尔卡尔曼平滑器(BKS)算法用于将可用的基因表达数据最优地映射到布尔状态序列中,然后将 BIRL 方法与 Q-learning 算法有效结合,以量化即时成本函数。通过将状态反馈控制器应用于两个 GRN 模型来研究所提出方法的性能:黑色素瘤 WNT5A 布尔网络和 p53-MDM2 负反馈环布尔网络,当使用所提出的方法学习不理想状态的成本(因此是不理想基因的身份)时。