Gao Shouguo, Jia Shuang, Hessner Martin, Wang Xujing
Department of Physics & the Comprehensive Diabetes Center, University of Alabama at Birmingham, 1300 University Blvd, Birmingham, AL 35294, USA.
J Comput Sci Syst Biol. 2008 Dec 26;1:41. doi: 10.4172/jcsb.1000003.
Previously we have reported a microarray image processing and data analysis package Matarray, where quality scores are defined for every spot that reflect the reliability and variability of the data acquired from each spot. In this article we present a new development in Matarray, where the quality scores are incorporated as weights in the statistical evaluation and data mining of microarray data. With this approach filtering of poor quality data is automatically achieved through the reduction in their weights, thereby eliminating the need to manually flag or remove bad data points, as well as the problem of missing values. More significantly, utilizing a set of control clones spiked in at known input ratios ranging from 1:30 to 30:1, we find that the quality-weighted statistics leads to more accurate gene expression measurements and more sensitive detection of their changes with significantly lower type II error rates. Further, we have applied the quality-weighted clustering to a time-course microarray data set, and find that the new algorithm improves grouping accuracy. In summary, incorporating quantitative quality measure of microarray data as weight in complex data analysis leads to improved reliability and convenience. In addition it provides a practical way to deal with the missing value issue in establishing automatic statistical tests.
此前我们报道过一个微阵列图像处理和数据分析软件包Matarray,其中为每个点定义了质量分数,这些分数反映了从每个点获取的数据的可靠性和可变性。在本文中,我们展示了Matarray的一项新进展,即质量分数被用作微阵列数据统计评估和数据挖掘中的权重。通过这种方法,低质量数据会因其权重降低而自动被过滤,从而无需手动标记或删除不良数据点,也避免了缺失值问题。更重要的是,利用一组以1:30至30:1的已知输入比例掺入的对照克隆,我们发现质量加权统计能带来更准确的基因表达测量结果,并能更灵敏地检测其变化,同时显著降低II型错误率。此外,我们将质量加权聚类应用于一个时间进程微阵列数据集,发现新算法提高了分组准确性。总之,在复杂数据分析中,将微阵列数据的定量质量度量作为权重纳入,可提高可靠性和便利性。此外,它还为在建立自动统计测试时处理缺失值问题提供了一种实用方法。