Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
Bioinformatics. 2011 Jul 1;27(13):1822-31. doi: 10.1093/bioinformatics/btr272. Epub 2011 May 5.
With the development of high-throughput genomic and proteomic technologies, coupled with the inherent difficulties in obtaining large samples, biomedicine faces difficult small-sample classification issues, in particular, error estimation. Most popular error estimation methods are motivated by intuition rather than mathematical inference. A recently proposed error estimator based on Bayesian minimum mean square error estimation places error estimation in an optimal filtering framework. In this work, we examine the application of this error estimator to gene expression microarray data, including the suitability of the Gaussian model with normal-inverse-Wishart priors and how to find prior probabilities.
We provide an implementation for non-linear classification, where closed form solutions are not available. We propose a methodology for calibrating normal-inverse-Wishart priors based on discarded microarray data and examine the performance on synthetic high-dimensional data and a real dataset from a breast cancer study. The calibrated Bayesian error estimator has superior root mean square performance, especially with moderate to high expected true errors and small feature sizes.
We have implemented in C code the Bayesian error estimator for Gaussian distributions and normal-inverse-Wishart priors for both linear classifiers, with exact closed-form representations, and arbitrary classifiers, where we use a Monte Carlo approximation. Our code for the Bayesian error estimator and a toolbox of related utilities are available at http://gsp.tamu.edu/Publications/supplementary/dalton11a. Several supporting simulations are also included.
随着高通量基因组学和蛋白质组学技术的发展,以及获得大样本的固有困难,生物医学面临着困难的小样本分类问题,特别是误差估计。大多数流行的误差估计方法都是基于直觉而不是数学推理。最近提出的基于贝叶斯最小均方误差估计的误差估计器将误差估计置于最优滤波框架中。在这项工作中,我们研究了该误差估计器在基因表达微阵列数据中的应用,包括具有正态逆 Wishart 先验的高斯模型的适用性以及如何找到先验概率。
我们提供了一种适用于非线性分类的实现方法,其中不存在闭式解。我们提出了一种基于丢弃微阵列数据的正态逆 Wishart 先验校准方法,并在高维合成数据和来自乳腺癌研究的真实数据集上检验了性能。校准的贝叶斯误差估计器具有更好的均方根性能,尤其是在中等至高的真实误差和较小的特征尺寸下。
我们用 C 代码实现了用于高斯分布和正态逆 Wishart 先验的贝叶斯误差估计器,用于线性分类器,具有精确的闭式表示,以及任意分类器,我们使用蒙特卡罗逼近。我们的贝叶斯误差估计器代码和相关实用程序的工具箱可在 http://gsp.tamu.edu/Publications/supplementary/dalton11a 上获得。还包括几个支持性的模拟。