Karbalayghareh Alireza, Braga-Neto Ulisses, Dougherty Edward R
Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA.
Center for Bioinformatics and Genomic Systems Engineering, College Station, 77843, TX, USA.
BMC Syst Biol. 2018 Mar 21;12(Suppl 3):23. doi: 10.1186/s12918-018-0549-y.
Expression-based phenotype classification using either microarray or RNA-Seq measurements suffers from a lack of specificity because pathway timing is not revealed and expressions are averaged across groups of cells. This paper studies expression-based classification under the assumption that single-cell measurements are sampled at a sufficient rate to detect regulatory timing. Thus, observations are expression trajectories. In effect, classification is performed on data generated by an underlying gene regulatory network.
Network regulation is modeled via a Boolean network with perturbation, regulation not fully determined owing to inherent biological randomness. The binary assumption is not critical because the resulting Markov chain characterizes expression trajectories. We assume a partially known Gaussian observation model belonging to an uncertainty class of models. We derive the intrinsically Bayesian robust classifier to discriminate between wild-type and mutated networks based on expression trajectories. The classifier minimizes the expected error across the uncertainty class relative to the prior distribution. We test it using a mammalian cell-cycle model, discriminating between the normal network and one in which gene p27 is mutated, thereby producing a cancerous phenotype. Tests examine all model aspects, including trajectory length, perturbation probability, and the hyperparameters governing the prior distribution over the uncertainty class.
Simulations show the rates at which the expected error is diminished by smaller perturbation probability, longer trajectories, and hyperparameters that tighten the prior distribution relative to the unknown true network. For average-expression measurement, methods have been proposed to obtain prior distributions. These should be extended to the more mathematically difficult, but more informative, expression trajectories.
使用微阵列或RNA测序测量进行基于表达的表型分类缺乏特异性,因为未揭示通路时间且表达是在细胞群体中进行平均的。本文研究在单细胞测量以足够速率采样以检测调控时间的假设下的基于表达的分类。因此,观测值是表达轨迹。实际上,分类是对由潜在基因调控网络生成的数据进行的。
通过具有扰动的布尔网络对网络调控进行建模,由于固有的生物随机性,调控未完全确定。二元假设并不关键,因为所得马尔可夫链表征了表达轨迹。我们假设一个部分已知的高斯观测模型,它属于一类不确定性模型。我们推导了内在贝叶斯鲁棒分类器,以基于表达轨迹区分野生型和突变网络。该分类器相对于先验分布最小化了整个不确定性类中的期望误差。我们使用哺乳动物细胞周期模型对其进行测试,区分正常网络和其中基因p27发生突变从而产生癌性表型的网络。测试考察了所有模型方面,包括轨迹长度、扰动概率以及控制不确定性类上先验分布的超参数。
模拟显示了通过较小的扰动概率、更长的轨迹以及相对于未知真实网络收紧先验分布的超参数来降低期望误差的速率。对于平均表达测量,已经提出了获得先验分布的方法。这些方法应扩展到数学上更困难但信息更丰富的表达轨迹。