Lu Zeyu, Xiao Xue, Zheng Qiang, Wang Xinlei, Xu Lin
bioRxiv. 2024 Mar 22:2024.02.01.578316. doi: 10.1101/2024.02.01.578316.
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
An introduction to available computational methods for predicting functional TRs from a query gene set.A detailed walk-through along with practical concerns and limitations.A systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.NGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
本文深入综述了用于根据查询基因集预测转录调节因子的计算方法。在许多生物学应用中,包括但不限于阐明生物发育机制、鉴定关键疾病基因和预测治疗靶点,转录调节因子的鉴定至关重要。在过去十年中,已经开发了各种基于下一代测序(NGS)数据的计算方法,但尚未对基于NGS的方法进行系统评估。我们根据共同特征将这些方法分为两类,即基于文库的方法和基于区域的方法。我们进一步进行了基准研究,以使用分子实验数据集评估基于NGS的方法的准确性、敏感性、覆盖范围和可用性。结果表明,BART、ChIP-Atlas和Lisa具有相对较好的性能。此外,我们指出了基于NGS的方法的局限性,并探索了进一步改进的潜在方向。
介绍了从查询基因集预测功能性转录调节因子的可用计算方法。详细介绍以及实际问题和局限性。使用570个转录调节因子扰动衍生的基因集,对基于NGS的方法在准确性、敏感性、覆盖范围和可用性方面进行系统基准测试。基于NGS的方法优于基于基序的方法。在基于NGS的方法中,那些使用更大数据库并采用以区域为中心方法的方法表现出良好的性能。推荐使用BART、ChIP-Atlas和Lisa,因为这些方法在评估场景中总体表现更好。