Department of Biochemistry and Convergence Medical Sciences and Institute of Medical Sciences, College of Medicine, Gyeongsang National University, Jinju, South Korea.
PeerJ. 2023 Oct 20;11:e16318. doi: 10.7717/peerj.16318. eCollection 2023.
Transcription factor binding to a gene regulatory region induces or represses its expression. Binding and expression target analysis (BETA) integrates the binding and gene expression data to predict this function. First, the regulatory potential of the factor is modeled based on the distance of its binding sites from the transcription start sites in a decay function. Then the differential expression statistics from an experiment where this factor was perturbed represent the binding effect. The rank product of the two values is employed to order in importance. This algorithm was originally implemented in Python. We reimplemented the algorithm in R to take advantage of existing data structures and other tools for downstream analyses. Here, we attempted to replicate the findings in the original BETA paper. We applied the new implementation to the same datasets using default and varying inputs and cutoffs. We successfully replicated the original results. Moreover, we showed that the method was appropriately influenced by varying the input and was robust to choices of cutoffs in statistical testing.
转录因子与基因调控区域结合会诱导或抑制其表达。结合和表达靶标分析(BETA)将结合和基因表达数据整合在一起,以预测该功能。首先,基于其结合位点与转录起始位点之间的距离,使用衰减函数对因子的调控潜力进行建模。然后,使用该因子受到干扰的实验中的差异表达统计数据来表示结合效应。这两个值的秩积用于重要性排序。该算法最初是用 Python 实现的。我们在 R 中重新实现了该算法,以利用现有的数据结构和其他工具进行下游分析。在这里,我们尝试复制 BETA 论文中的原始发现。我们使用默认值和变化的输入和截止值将新实现应用于相同的数据集。我们成功复制了原始结果。此外,我们还表明,该方法受输入变化的适当影响,并且在统计检验中对截止值的选择具有鲁棒性。