Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17121 Solna, Sweden.
Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 75185 Uppsala, Sweden.
Bioinformatics. 2022 Apr 12;38(8):2263-2268. doi: 10.1093/bioinformatics/btac103.
Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.
We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.
Data: https://bitbucket.org/sonnhammergrni/lscon; Code: https://bitbucket.org/sonnhammergrni/genespider.
Supplementary data are available at Bioinformatics online.
长期以来,推断准确的基因调控网络(GRN)一直是系统生物学领域的一个关键目标。为此,在最大数量的真阳性和最小数量的假阳性相互作用之间找到一个合适的平衡是很重要的。另一个关键特征是,推断方法可以处理现代实验数据的大规模,这意味着该方法需要既快速又准确。最小二乘截止(LSCO)方法可以满足这两个标准,但是由于它基于最小二乘法,因此容易受到放大极值、小或大的已知问题的影响。在 GRN 中,由于值变化极小,这些基因错误地与大量基因高度连接,这表现为基因错误地与大量基因高度连接。
我们开发了一种名为 Least Squares Cut-Off with Normalization(LSCON)的 GRN 推断方法,该方法可以解决这个问题。LSCON 通过正则化扩展 LSCO 算法,避免基因过度连接,从而减少假阳性。所使用的正则化基于归一化,它消除了极值对拟合的影响。我们对 LSCON 进行了基准测试,并在准确性、速度和预测过度连接基因的倾向方面将其与 Genie3、LASSO、LSCO 和 Ridge 回归进行了比较。结果表明,与 LASSO(现有最好的方法)相比,LSCON 具有更好或相等的准确性,特别是对于具有极值的数据。由于最小二乘回归的速度,LSCON 比 LASSO 快一个数量级。
数据:https://bitbucket.org/sonnhammergrni/lscon;代码:https://bitbucket.org/sonnhammergrni/genespider。
补充数据可在 Bioinformatics 在线获得。