• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

知识提取辅助细菌转录因子特性摘要的编纂。

Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties.

机构信息

Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico.

División de Posgrado, Universidad Tecnológica de la Mixteca, Carretera a Acatlima Km. 2.5, Huajuapan de León, 69000, Oaxaca, Mexico.

出版信息

Database (Oxford). 2020 Dec 11;2020. doi: 10.1093/database/baaa109.

DOI:10.1093/database/baaa109
PMID:33306798
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7731926/
Abstract

Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).

摘要

转录因子 (TFs) 在细菌的转录调控中发挥着主要作用,因为它们调节 DNA 中编码的遗传信息的转录。因此,对这些调节蛋白的特性进行编目对于更好地理解转录调控至关重要。然而,由于每天都有大量的生物医学文献增加,传统上通过人工编辑文章集合来编译 TF 特性描述需要花费大量的时间和精力。因此,开发用于知识提取的自动方法以协助编目至关重要。在这里,我们展示了一种基于自动文本摘要策略的有效方法,用于提取知识以协助编目描述细菌 TF 特性的摘要。通过处理 5961 篇科学文章,我们能够自动恢复手动摘要中描述的 177 个大肠杆菌 K-12 TF 特性的知识的中位数 77%。对于 71%的 TF,我们的方法提取了新的知识,可以用于扩展手动描述。此外,由于我们使用大肠杆菌的手动摘要来训练我们的预测模型,我们还从 3498 篇文章中为沙门氏菌血清型鼠伤寒菌的 185 个 TF 生成了摘要。根据对这 10 个鼠伤寒菌摘要的手动编目,其中 96%的句子包含相关知识。我们的结果证明了可以协助手动编目,使用自动提取的新知识自动扩展手动摘要,并为这些编目工作不存在的细菌创建新的摘要。数据库网址:大肠杆菌和沙门氏菌 TF 的自动摘要和自动摘要器可在 GitHub 上获得 (https://github.com/laigen-unam/tf-properties-summarizer.git)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/cef5c6a55998/baaa109f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/36ae4d5a95de/baaa109f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/7d2ddc716279/baaa109f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/1488a52f2077/baaa109f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/60c1fff6529b/baaa109f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/cef5c6a55998/baaa109f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/36ae4d5a95de/baaa109f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/7d2ddc716279/baaa109f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/1488a52f2077/baaa109f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/60c1fff6529b/baaa109f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/cef5c6a55998/baaa109f5.jpg

相似文献

1
Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties.知识提取辅助细菌转录因子特性摘要的编纂。
Database (Oxford). 2020 Dec 11;2020. doi: 10.1093/database/baaa109.
2
Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach.基于 BERT 的方法从生物医学文献中自动提取细菌的转录调控相互作用。
Database (Oxford). 2024 Aug 30;2024. doi: 10.1093/database/baae094.
3
Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12.大肠杆菌K-12中OxyR调控相互作用和生长条件的辅助整理
Database (Oxford). 2014 Jun 4;2014. doi: 10.1093/database/bau049. Print 2014.
4
RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in K-12.RegulonDB 11.0:K-12 中转录调控的综合高通量数据集。
Microb Genom. 2022 May;8(5). doi: 10.1099/mgen.0.000833.
5
Automatic reconstruction of a bacterial regulatory network using Natural Language Processing.使用自然语言处理自动重建细菌调控网络。
BMC Bioinformatics. 2007 Aug 7;8:293. doi: 10.1186/1471-2105-8-293.
6
RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units).RegulonDB 7.0版本:整合在遗传感应反应单元(Gensor单元)内的大肠杆菌K-12转录调控。
Nucleic Acids Res. 2011 Jan;39(Database issue):D98-105. doi: 10.1093/nar/gkq1110. Epub 2010 Nov 4.
7
Lisen&Curate: A platform to facilitate gathering textual evidence for curation of regulation of transcription initiation in bacteria.Lisen&Curate:一个促进收集细菌转录起始调控文本证据的平台。
Biochim Biophys Acta Gene Regul Mech. 2021 Nov-Dec;1864(11-12):194753. doi: 10.1016/j.bbagrm.2021.194753. Epub 2021 Aug 28.
8
Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655.系统发现大肠杆菌 K-12 MG1655 中未表征的转录因子。
Nucleic Acids Res. 2018 Nov 16;46(20):10682-10696. doi: 10.1093/nar/gky752.
9
Single-target regulators form a minor group of transcription factors in Escherichia coli K-12.在大肠杆菌 K-12 中,单靶点调控因子是一类较少的转录因子。
Nucleic Acids Res. 2018 May 4;46(8):3921-3936. doi: 10.1093/nar/gky138.
10
Hierarchy of transcription factor network in Escherichia coli K-12: H-NS-mediated silencing and Anti-silencing by global regulators.大肠杆菌 K-12 转录因子网络层次结构:H-NS 介导的沉默和全局调控因子的反沉默作用。
FEMS Microbiol Rev. 2021 Nov 23;45(6). doi: 10.1093/femsre/fuab032.

引用本文的文献

1
Unsupervised learning and natural language processing highlight research trends in a superbug.无监督学习和自然语言处理突出了一种超级细菌的研究趋势。
Front Artif Intell. 2024 Mar 21;7:1336071. doi: 10.3389/frai.2024.1336071. eCollection 2024.
2
transcription factors of unknown function: sequence features and possible evolutionary relationships.未知功能的转录因子:序列特征和可能的进化关系。
PeerJ. 2022 Jul 20;10:e13772. doi: 10.7717/peerj.13772. eCollection 2022.

本文引用的文献

1
RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12.RegulonDB v 10.5:应对挑战,统一大肠杆菌 K-12 中经典和高通量基因调控知识。
Nucleic Acids Res. 2019 Jan 8;47(D1):D212-D220. doi: 10.1093/nar/gky1077.
2
First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes.转录因子特性自动摘要在 RegulonDB 中的初探:关于结构域和调控过程的句子分类。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax070.
3
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
EcoCyc数据库:反映有关大肠杆菌K-12的新知识。
Nucleic Acids Res. 2017 Jan 4;45(D1):D543-D550. doi: 10.1093/nar/gkw1003. Epub 2016 Nov 28.
4
Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models.过采样和欠采样技术与交叉验证联合用于预测模型的开发和评估
BMC Bioinformatics. 2015 Nov 4;16:363. doi: 10.1186/s12859-015-0784-9.
5
RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond.RegulonDB 9.0版本:基因调控、共表达、基序聚类及其他方面的高级整合。
Nucleic Acids Res. 2016 Jan 4;44(D1):D133-43. doi: 10.1093/nar/gkv1156. Epub 2015 Nov 2.
6
Text summarization in the biomedical domain: a systematic review of recent research.生物医学领域的文本摘要:近期研究的系统综述
J Biomed Inform. 2014 Dec;52:457-67. doi: 10.1016/j.jbi.2014.06.009. Epub 2014 Jul 10.
7
Event-based text mining for biology and functional genomics.用于生物学和功能基因组学的基于事件的文本挖掘
Brief Funct Genomics. 2015 May;14(3):213-30. doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6.
8
BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.生物词元化器:一种用于生物医学文本形态处理的词元化工具。
J Biomed Semantics. 2012 Apr 1;3:3. doi: 10.1186/2041-1480-3-3.
9
The transcription factor encyclopedia.转录因子百科全书。
Genome Biol. 2012;13(3):R24. doi: 10.1186/gb-2012-13-3-r24.
10
Automatic classification of sentences to support Evidence Based Medicine.支持循证医学的句子自动分类。
BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-12-S2-S5.