Center for Human and Clinical Genetics, Leiden University Medical Center, The Netherlands.
PLoS Comput Biol. 2011 Nov;7(11):e1002258. doi: 10.1371/journal.pcbi.1002258. Epub 2011 Nov 3.
Gene regulatory networks give important insights into the mechanisms underlying physiology and pathophysiology. The derivation of gene regulatory networks from high-throughput expression data via machine learning strategies is problematic as the reliability of these models is often compromised by limited and highly variable samples, heterogeneity in transcript isoforms, noise, and other artifacts. Here, we develop a novel algorithm, dubbed Dandelion, in which we construct and train intraspecies Bayesian networks that are translated and assessed on independent test sets from other species in a reiterative procedure. The interspecies disease networks are subjected to multi-layers of analysis and evaluation, leading to the identification of the most consistent relationships within the network structure. In this study, we demonstrate the performance of our algorithms on datasets from animal models of oculopharyngeal muscular dystrophy (OPMD) and patient materials. We show that the interspecies network of genes coding for the proteasome provide highly accurate predictions on gene expression levels and disease phenotype. Moreover, the cross-species translation increases the stability and robustness of these networks. Unlike existing modeling approaches, our algorithms do not require assumptions on notoriously difficult one-to-one mapping of protein orthologues or alternative transcripts and can deal with missing data. We show that the identified key components of the OPMD disease network can be confirmed in an unseen and independent disease model. This study presents a state-of-the-art strategy in constructing interspecies disease networks that provide crucial information on regulatory relationships among genes, leading to better understanding of the disease molecular mechanisms.
基因调控网络为理解生理和病理生理学的机制提供了重要的见解。通过机器学习策略从高通量表达数据中推导出基因调控网络存在问题,因为这些模型的可靠性经常受到样本数量有限且高度变化、转录同工型的异质性、噪声和其他伪影的影响。在这里,我们开发了一种新的算法,称为蒲公英,在该算法中,我们构建并训练种内贝叶斯网络,然后在迭代过程中,将这些网络翻译成其他物种的独立测试集,并对其进行评估。种间疾病网络经过多层次的分析和评估,从而确定网络结构中最一致的关系。在这项研究中,我们展示了我们的算法在眼咽型肌营养不良症 (OPMD) 动物模型和患者材料的数据集上的性能。我们表明,编码蛋白酶体的基因的种间网络对基因表达水平和疾病表型提供了高度准确的预测。此外,跨物种翻译增加了这些网络的稳定性和鲁棒性。与现有的建模方法不同,我们的算法不需要对蛋白质直系同源物或替代转录本的众所周知的困难一对一映射做出假设,并且可以处理缺失数据。我们表明,在看不见的和独立的疾病模型中,可以证实 OPMD 疾病网络的鉴定出的关键组件。这项研究提出了一种构建种间疾病网络的最新策略,该策略提供了基因之间调控关系的关键信息,从而更好地理解疾病的分子机制。