Virginia Bioinformatics Institute, Virginia Polytechnic and State University, Blacksburg, Virginia, United States of America.
PLoS One. 2011;6(7):e22071. doi: 10.1371/journal.pone.0022071. Epub 2011 Jul 22.
The widespread use of high-throughput experimental assays designed to measure the entire complement of a cell's genes or gene products has led to vast stores of data that are extremely plentiful in terms of the number of items they can measure in a single sample, yet often sparse in the number of samples per experiment due to their high cost. This often leads to datasets where the number of treatment levels or time points sampled is limited, or where there are very small numbers of technical and/or biological replicates. Here we introduce a novel algorithm to quantify the uncertainty in the unmeasured intervals between biological measurements taken across a set of quantitative treatments. The algorithm provides a probabilistic distribution of possible gene expression values within unmeasured intervals, based on a plausible biological constraint. We show how quantification of this uncertainty can be used to guide researchers in further data collection by identifying which samples would likely add the most information to the system under study. Although the context for developing the algorithm was gene expression measurements taken over a time series, the approach can be readily applied to any set of quantitative systems biology measurements taken following quantitative (i.e. non-categorical) treatments. In principle, the method could also be applied to combinations of treatments, in which case it could greatly simplify the task of exploring the large combinatorial space of future possible measurements.
高通量实验测定法被广泛应用于测定细胞内所有基因或基因产物,从而产生了大量的数据。这些数据在单一样本中可测量的项目数量上非常丰富,但由于其成本高昂,每个实验的样本数量通常很少。这通常导致数据集存在以下情况:处理水平或采样时间点的数量有限,或者技术和/或生物学重复数量非常少。在这里,我们引入了一种新的算法,用于量化在一组定量处理中进行的生物测量之间未测量间隔的不确定性。该算法基于合理的生物学约束,为未测量间隔内可能的基因表达值提供了概率分布。我们展示了如何通过确定哪些样本可能会为研究中的系统添加最多信息,来利用这种不确定性的量化来指导研究人员进一步进行数据收集。虽然开发该算法的背景是在时间序列上进行的基因表达测量,但该方法可以很容易地应用于任何一组在定量(即非分类)处理后进行的定量系统生物学测量。原则上,该方法也可以应用于处理的组合,在这种情况下,它可以大大简化探索未来可能的测量的大型组合空间的任务。