Le Thanh Tuoi, Dang Xuan Tho
Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam.
Faculty of Information Technology, Vinh University of Technology Education, Vinh, Vietnam.
Bioinform Biol Insights. 2025 Feb 25;19:11779322251316130. doi: 10.1177/11779322251316130. eCollection 2025.
Identifying interactions between transcription factors (TFs) and target genes is crucial for understanding the molecular mechanisms involved in biological processes and diseases. Traditional biological experiments used to determine these interactions are often time-consuming, costly, and limited in scale. Current computational methods mainly predict binding sites rather than direct interactions. Although recent studies have achieved high performance in predicting TF-target gene associations, they still face a significant challenge related to constructing a robust dataset of positive and negative samples. Currently, methods do not adequately focus on selecting negative samples, resulting in incomplete coverage of potential TF-target gene relationships. This article proposes a method to select enhanced negative samples to improve the prediction performance of TF-target gene interactions. Experimental results show that the proposed method achieves an average area under the curve (AUC) value of 0.9024 ± 0.0008 through 5-fold cross-validation. These results demonstrate the model's high efficiency and accuracy, confirming its potential application in predicting TF-target gene interactions across various datasets and paving the way for large-scale biomedical research.
识别转录因子(TFs)与靶基因之间的相互作用对于理解生物过程和疾病所涉及的分子机制至关重要。用于确定这些相互作用的传统生物学实验通常耗时、成本高且规模有限。当前的计算方法主要预测结合位点而非直接相互作用。尽管最近的研究在预测TF-靶基因关联方面取得了高性能,但它们在构建正负样本的稳健数据集方面仍面临重大挑战。目前,方法没有充分关注负样本的选择,导致潜在的TF-靶基因关系覆盖不完整。本文提出了一种选择增强负样本的方法,以提高TF-靶基因相互作用的预测性能。实验结果表明,所提出的方法通过5折交叉验证实现了0.9024±0.0008的平均曲线下面积(AUC)值。这些结果证明了该模型的高效性和准确性,证实了其在跨各种数据集预测TF-靶基因相互作用中的潜在应用,并为大规模生物医学研究铺平了道路。