Suppr超能文献

“模拟分子进化”还是计算机生成的人工制品?

"Simulated molecular evolution" or computer-generated artifacts?

作者信息

Darius F, Rojas R

出版信息

Biophys J. 1994 Nov;67(5):2120-2. doi: 10.1016/S0006-3495(94)80695-2.

Abstract
  1. The authors define a function with value 1 for the positive examples and 0 for the negative ones. They fit a continuous function but do not deal at all with the error margin of the fit, which is almost as large as the function values they compute. 2. The term "quality" for the value of the fitted function gives the impression that some biological significance is associated with values of the fitted function strictly between 0 and 1, but there is no justification for this kind of interpretation and finding the point where the fit achieves its maximum does not make sense. 3. By neglecting the error margin the authors try to optimize the fitted function using differences in the second, third, fourth, and even fifth decimal place which have no statistical significance. 4. Even if such a fit could profit from more data points, the authors should first prove that the region of interest has some kind of smoothness, that is, that a continuous fit makes any sense at all. 5. "Simulated molecular evolution" is a misnomer. We are dealing here with random search. Since the margin of error is so large, the fitted function does not provide statistically significant information about the points in search space where strings with cleavage sites could be found. This implies that the method is a highly unreliable stochastic search in the space of strings, even if the neural network is capable of learning some simple correlations. 6. Classical statistical methods are for these kind of problems with so few data points clearly superior to the neural networks used as a "black box" by the authors, which in the way they are structured provide a model with an error margin as large as the numbers being computed.7. And finally, even if someone would provide us with a function which separates strings with cleavage sites from strings without them perfectly, so-called simulated molecular evolution would not be better than random selection.Since a perfect fit would only produce exactly ones or zeros,starting a search in a region of space where all strings in the neighborhood get the value zero would not provide any kind of directional information for new iterations. We would just skip from one point to the other in a typical random walk manner.
摘要
  1. 作者定义了一个函数,正例的值为1,负例的值为0。他们拟合了一个连续函数,但完全没有处理拟合的误差范围,该误差范围几乎与他们计算的函数值一样大。2. 用“质量”来表示拟合函数的值,给人的印象是,拟合函数值严格介于0和1之间具有某种生物学意义,但这种解释没有依据,而且找到拟合达到最大值的点也毫无意义。3. 由于忽略了误差范围,作者试图利用小数点后第二、第三、第四甚至第五位的差异来优化拟合函数,而这些差异没有统计学意义。4. 即使这样的拟合可以从更多的数据点中受益,作者也应该首先证明感兴趣的区域具有某种平滑性,也就是说,连续拟合是有意义的。5. “模拟分子进化”是一个误称。我们这里处理的是随机搜索。由于误差范围如此之大,拟合函数没有提供关于搜索空间中可能找到具有切割位点的字符串的点的统计学显著信息。这意味着该方法在字符串空间中是一种高度不可靠的随机搜索,即使神经网络能够学习一些简单的相关性。6. 对于这类数据点如此少的问题,经典统计方法显然优于作者用作“黑箱”的神经网络,神经网络的结构方式提供了一个误差范围与所计算数字一样大的模型。7. 最后,即使有人为我们提供一个能完美区分具有切割位点的字符串和没有切割位点的字符串的函数,所谓的模拟分子进化也不会比随机选择更好。因为完美拟合只会产生恰好为1或0的值,在一个邻域内所有字符串值都为0的空间区域开始搜索,不会为新的迭代提供任何方向信息。我们只会以典型的随机游走方式从一个点跳到另一个点。

相似文献

3
[Introduction to statistical methodology].[统计方法导论]
Eur J Orthop Surg Traumatol. 1995 Dec;5(4):249-52. doi: 10.1007/BF02716530.
7
Fitting a geometric graph to a protein-protein interaction network.将几何图拟合到蛋白质-蛋白质相互作用网络。
Bioinformatics. 2008 Apr 15;24(8):1093-9. doi: 10.1093/bioinformatics/btn079. Epub 2008 Mar 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验