Suppr超能文献

DMLS:从大量文献文章中提取果蝇模块化转录调控因子及其靶标的自动化流水线。

DMLS: an automated pipeline to extract the Drosophila modular transcription regulators and targets from massive literature articles.

机构信息

Department of Biomedical Engineering, National Cheng Kung University, No.1, University Road, Tainan 701, Taiwan.

Medical Device Innovation Center, National Cheng Kung University, No.1, University Road, Tainan 701, Taiwan.

出版信息

Database (Oxford). 2024 Jun 20;2024:0. doi: 10.1093/database/baae049.

Abstract

Transcription regulation in multicellular species is mediated by modular transcription factor (TF) binding site combinations termed cis-regulatory modules (CRMs). Such CRM-mediated transcription regulation determines the gene expression patterns during development. Biologists frequently investigate CRM transcription regulation on gene expressions. However, the knowledge of the target genes and regulatory TFs participating in the CRMs under study is mostly fragmentary throughout the literature. Researchers need to afford tremendous human resources to fully surf through the articles deposited in biomedical literature databases in order to obtain the information. Although several novel text-mining systems are now available for literature triaging, these tools do not specifically focus on CRM-related literature prescreening, failing to correctly extract the information of the CRM target genes and regulatory TFs from the literature. For this reason, we constructed a supportive auto-literature prescreener called Drosophila Modular transcription-regulation Literature Screener (DMLS) that achieves the following: (i) prescreens articles describing experiments on modular transcription regulation, (ii) identifies the described target genes and TFs of the CRMs under study for each modular transcription-regulation-describing article and (iii) features an automated and extendable pipeline to perform the task. We demonstrated that the final performance of DMLS in extracting the described target gene and regulatory TF lists of CRMs under study for given articles achieved test macro area under the ROC curve (auROC) = 89.7% and area under the precision-recall curve (auPRC) = 77.6%, outperforming the intuitive gene name-occurrence-counting method by at least 19.9% in auROC and 30.5% in auPRC. The web service and the command line versions of DMLS are available at https://cobis.bme.ncku.edu.tw/DMLS/  and  https://github.com/cobisLab/DMLS/, respectively. Database Tool URL: https://cobis.bme.ncku.edu.tw/DMLS/.

摘要

多细胞物种的转录调控是由模块化转录因子(TF)结合位点组合介导的,这些组合被称为顺式调控模块(CRMs)。这种 CRM 介导的转录调控决定了发育过程中的基因表达模式。生物学家经常研究 CRM 对基因表达的转录调控。然而,在整个文献中,关于参与研究的 CRMs 的靶基因和调节 TF 的知识大多是零碎的。研究人员需要投入大量的人力资源来全面浏览生物医学文献数据库中存储的文章,以获取相关信息。尽管现在有几个新的文本挖掘系统可用于文献分类,但这些工具并没有专门针对 CRM 相关文献的预筛选,无法从文献中正确提取 CRM 靶基因和调节 TF 的信息。因此,我们构建了一个支持性的自动文献预筛选器,称为 Drosophila Modular transcription-regulation Literature Screener(DMLS),它实现了以下功能:(i)预筛选描述模块化转录调控实验的文章,(ii)为每个描述模块化转录调控的文章识别研究中描述的 CRM 的靶基因和 TF,以及(iii)具有自动和可扩展的管道来执行任务。我们证明,DMLS 在提取给定文章中研究的 CRM 的描述靶基因和调节 TF 列表方面的最终性能,在测试宏接收器操作特征曲线(auROC)下的面积为 89.7%,在精确召回曲线(auPRC)下的面积为 77.6%,比直观的基因名称出现计数方法至少高出 19.9%在 auROC 和 30.5%在 auPRC。DMLS 的网络服务和命令行版本分别可在 https://cobis.bme.ncku.edu.tw/DMLS/https://github.com/cobisLab/DMLS/ 获得。数据库工具 URL:https://cobis.bme.ncku.edu.tw/DMLS/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a36/11188685/304078752f08/baae049f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验