Department of Statistics and Data Science, Moody School of Graduate and Advanced Studies, Southern Methodist University, 3225 Daniel Ave., P.O. Box 750332, Dallas, TX, United States.
Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae366.
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
本文深入探讨了利用查询基因集预测转录调控因子 (TR) 的计算方法。TR 的鉴定在许多生物学应用中至关重要,包括但不限于阐明生物发育机制、识别关键疾病基因和预测治疗靶点。过去十年中已经开发了各种基于下一代测序 (NGS) 数据的计算方法,但尚未提供基于 NGS 的方法的系统评估。我们根据共享特征将这些方法分为两类,即基于文库的方法和基于区域的方法。我们进一步进行了基准研究,使用分子实验数据集评估了基于 NGS 的方法的准确性、灵敏度、覆盖度和可用性。结果表明,BART、ChIP-Atlas 和 Lisa 具有相对较好的性能。此外,我们指出了基于 NGS 的方法的局限性,并探讨了进一步改进的潜在方向。