Torres David J, Vasilic Ana, Pacheco Jose
Department of Mathematics and Physical Science, Northern New Mexico College, Española, New Mexico, USA.
J Bioinform Syst Biol. 2024;7(1):63-80. doi: 10.26502/jbsb.5107079. Epub 2024 Mar 4.
We show that the simple and multiple linear regression coefficients and the coefficient of determination R computed from sampling distributions of the mean (with or without replacement) are equal to the regression coefficients and coefficient of determination computed with individual data. Moreover, the standard error of estimate is reduced by the square root of the group size for sampling distributions of the mean. The result has applications when formulating a distance measure between two genes in a hierarchical clustering algorithm. We show that the Pearson coefficient can measure how differential expression in one gene correlates with differential expression in a second gene.
我们表明,从均值的抽样分布(有放回或无放回)计算得到的简单线性回归系数、多元线性回归系数以及决定系数R,与使用个体数据计算得到的回归系数和决定系数相等。此外,对于均值的抽样分布,估计标准误差会因组大小的平方根而减小。该结果在分层聚类算法中制定两个基因之间的距离度量时具有应用价值。我们表明,皮尔逊系数可以衡量一个基因中的差异表达与另一个基因中的差异表达之间的相关性。