University Hospital Ulm, Ulm.
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):487-98. doi: 10.1109/TCBB.2011.62. Epub 2011 Mar 22.
Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary variables (either “expressed” or “not expressed”). Common binarization methods often cluster measurements or separate them according to statistical or information theoretic characteristics and may require many data points to determine a robust threshold. Yet, time series measurements frequently comprise only a small number of samples. To overcome this limitation, we propose a binarization that incorporates measurements at multiple resolutions. We introduce two such binarization approaches which determine thresholds based on limited numbers of samples and additionally provide a measure of threshold validity. Thus, network reconstruction and further analysis can be restricted to genes with meaningful thresholds. This reduces the complexity of network inference. The performance of our binarization algorithms was evaluated in network reconstruction experiments using artificial data as well as real-world yeast expression time series. The new approaches yield considerably improved correct network identification rates compared to other binarization techniques by effectively reducing the amount of candidate networks.
网络推断算法可以帮助生命科学家在分子水平上解开基因调控系统。近年来,从时间序列中重建布尔网络引起了广泛关注。这些网络需要二值化,因为这些网络将基因建模为二进制变量(“表达”或“不表达”)。常见的二值化方法通常根据统计或信息论特征对测量值进行聚类或分离,并且可能需要许多数据点来确定一个稳健的阈值。然而,时间序列测量通常只包含少量样本。为了克服这一限制,我们提出了一种整合多分辨率测量值的二值化方法。我们引入了两种这样的二值化方法,它们基于有限数量的样本确定阈值,并提供了一种阈值有效性的度量。因此,网络重构和进一步的分析可以限制在具有有意义阈值的基因上。这降低了网络推断的复杂性。我们的二值化算法在使用人工数据和真实酵母表达时间序列的网络重构实验中的性能进行了评估。与其他二值化技术相比,新方法通过有效地减少候选网络的数量,显著提高了正确网络识别率。