Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; Biology Department, New York University, New York, NY 10003, USA.
Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; Biology Department, New York University, New York, NY 10003, USA; Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA; Center for Data Science, New York University, New York, NY 10003, USA; Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.
Cell Rep. 2018 Apr 10;23(2):376-388. doi: 10.1016/j.celrep.2018.03.048.
Large-scale inference of eukaryotic transcription-regulatory networks remains challenging. One underlying reason is that existing algorithms typically ignore crucial regulatory mechanisms, such as RNA degradation and post-transcriptional processing. Here, we describe InfereCLaDR, which incorporates such elements and advances prediction in Saccharomyces cerevisiae. First, InfereCLaDR employs a high-quality Gold Standard dataset that we use separately as prior information and for model validation. Second, InfereCLaDR explicitly models transcription factor activity and RNA half-lives. Third, it introduces expression subspaces to derive condition-responsive regulatory networks for every gene. InfereCLaDR's final network is validated by known data and trends and results in multiple insights. For example, it predicts long half-lives for transcripts of the nucleic acid metabolism genes and members of the cytosolic chaperonin complex as targets of the proteasome regulator Rpn4p. InfereCLaDR demonstrates that more biophysically realistic modeling of regulatory networks advances prediction accuracy both in eukaryotes and prokaryotes.
真核转录调控网络的大规模推断仍然具有挑战性。一个根本原因是,现有的算法通常忽略了关键的调控机制,如 RNA 降解和转录后处理。在这里,我们描述了 InfereCLaDR,它整合了这些元素,并在酿酒酵母中进行了预测。首先,InfereCLaDR 使用了一个高质量的 Gold Standard 数据集,我们将其分别用作先验信息和模型验证。其次,InfereCLaDR 明确地对转录因子活性和 RNA 半衰期进行建模。第三,它引入表达子空间,为每个基因推导条件响应的调控网络。InfereCLaDR 的最终网络通过已知数据和趋势进行验证,并产生了多个见解。例如,它预测核酸代谢基因和细胞质伴侣素复合物成员的转录物的半衰期较长,它们是蛋白酶体调节因子 Rpn4p 的靶标。InfereCLaDR 表明,对调控网络进行更符合生物物理实际的建模可以提高真核生物和原核生物预测的准确性。