Suppr超能文献

人类和拟南芥基因组中转录因子结合位点预测工具的评估

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.

作者信息

Wanniarachchi Dinithi V, Viswakula Sameera, Wickramasuriya Anushka M

机构信息

Department of Plant Sciences, Faculty of Science, University of Colombo, Colombo 03, Sri Lanka.

Department of Statistics, Faculty of Science, University of Colombo, Colombo 03, Sri Lanka.

出版信息

BMC Bioinformatics. 2024 Dec 2;25(1):371. doi: 10.1186/s12859-024-05995-0.

Abstract

BACKGROUND

The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively. Thus, the present study focused on assessing twelve widely used TFBS prediction tools and four de novo motif discovery tools using a benchmark dataset comprising real, generic, Markov, and negative sequences. TFBSs of Arabidopsis thaliana and Homo sapiens genomes downloaded from the JASPAR database were implanted in these sequences and the performance of tools was evaluated using several statistical parameters at different overlap percentages between the lengths of known and predicted binding sites.

RESULTS

Overall, the Multiple Cluster Alignment and Search Tool (MCAST) emerged as the best TFBS prediction tool, followed by Find Individual Motif Occurrences (FIMO) and MOtif Occurrence Detection Suite (MOODS). In addition, MotEvo and Dinucleotide Weight Tensor Toolbox (DWT-toolbox) demonstrated the highest sensitivity in identifying TFBSs at 90% and 80% overlap. Further, MCAST and DWT-toolbox managed to demonstrate the highest sensitivity across all three data types real, generic, and Markov. Among the de novo motif discovery tools, the Multiple Em for Motif Elicitation (MEME) emerged as the best performer. An analysis of the promoter regions of genes involved in the anthocyanin biosynthesis pathway in plants and the pentose phosphate pathway in humans, using the three best-performing tools, revealed considerable variation among the top 20 motifs identified by these tools.

CONCLUSION

The findings of this study lay a robust groundwork for selecting optimal TFBS prediction tools for future research. Given the variability observed in tool performance, employing multiple tools for identifying TFBSs in a set of sequences is highly recommended. In addition, further studies are recommended to develop an integrated toolbox that incorporates TFBS prediction or motif discovery tools, aiming to streamline result precision and accuracy.

摘要

背景

转录因子结合位点(TFBSs)的精确预测对于揭示生物过程背后的基因调控网络至关重要。尽管近年来出现了众多用于计算机模拟TFBS预测的工具,但计算生物学不断发展的形势需要对工具性能进行全面评估,以确保准确性和可靠性。仅有少数研究对TFBS预测工具的性能进行了全面评估。因此,本研究聚焦于使用包含真实、通用、马尔可夫和负序列的基准数据集,评估十二种广泛使用的TFBS预测工具和四种从头基序发现工具。从JASPAR数据库下载的拟南芥和人类基因组的TFBSs被植入这些序列中,并使用已知和预测结合位点长度之间不同重叠百分比下的几个统计参数来评估工具的性能。

结果

总体而言,多重聚类比对和搜索工具(MCAST)成为最佳的TFBS预测工具,其次是查找单个基序出现情况(FIMO)和基序出现检测套件(MOODS)。此外,MotEvo和二核苷酸权重张量工具箱(DWT - toolbox)在重叠率为90%和80%时,在识别TFBSs方面表现出最高的灵敏度。此外,MCAST和DWT - toolbox在所有三种数据类型(真实、通用和马尔可夫)中均表现出最高的灵敏度。在从头基序发现工具中,多重期望最大化基序引出(MEME)表现最佳。使用三种性能最佳的工具对植物花青素生物合成途径和人类磷酸戊糖途径中涉及的基因的启动子区域进行分析,结果显示这些工具识别出的前20个基序之间存在显著差异。

结论

本研究结果为未来研究选择最佳的TFBS预测工具奠定了坚实基础。鉴于工具性能存在差异,强烈建议在一组序列中使用多种工具来识别TFBSs。此外,建议进一步开展研究,开发一个整合了TFBS预测或基序发现工具的综合工具箱,以提高结果的精确性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0850/11613939/4ee20919c0da/12859_2024_5995_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验