Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.
PLoS Comput Biol. 2010 Dec 2;6(12):e1001014. doi: 10.1371/journal.pcbi.1001014.
Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.
细胞基因表达测量包含可用于发现新的网络关系的调节信息。在这里,我们提出了一种新的基于自适应套索的网络重构算法,这是一种在理论和经验上都表现良好的选择网络调节特征的方法。任何旨在利用有向概率图进行网络发现的算法都需要通过实验或自然发生的遗传变异产生的扰动,才能成功地从基因表达数据中推断出独特的调节关系。我们的方法利用了适当选择的顺式表达数量性状基因座(cis-eQTL),这些基因座提供了一组足够的独立扰动,以实现最大的网络分辨率。我们将我们的网络重构算法的性能与其他四种方法进行了比较:PC 算法、QTLnet、QDG 算法和 NEO 算法,这些方法都被用于利用 QTL 重构表型之间的有向网络。我们表明,自适应套索可以在十个基因和十个 cis-eQTL 的网络中优于这些算法,并且在具有三十个基因和三十个 cis-eQTL 的网络中与 QDG 算法具有竞争力,具有丰富的拓扑结构和数百个样本。使用这种新方法,我们在分析野生型和实验室菌株之间的杂交的全基因组基因表达数据时,确定了酿酒酵母中独特的有向关系集。我们在酪氨酸生物合成基因(TYR1)和参与内吞作用的基因(RCY1)、纺锤体检查点(BUB2)、硫酸盐分解代谢(JLP1)和细胞间通信(PRM7)之间恢复了新的假定的网络关系。我们的算法提供了特征选择方法和图形模型理论的综合,有可能从群体水平遗传和基因表达数据的分析中揭示新的有向调节关系。