Suppr超能文献

知识提取辅助细菌转录因子特性摘要的编纂。

Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties.

机构信息

Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico.

División de Posgrado, Universidad Tecnológica de la Mixteca, Carretera a Acatlima Km. 2.5, Huajuapan de León, 69000, Oaxaca, Mexico.

出版信息

Database (Oxford). 2020 Dec 11;2020. doi: 10.1093/database/baaa109.

Abstract

Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).

摘要

转录因子 (TFs) 在细菌的转录调控中发挥着主要作用,因为它们调节 DNA 中编码的遗传信息的转录。因此,对这些调节蛋白的特性进行编目对于更好地理解转录调控至关重要。然而,由于每天都有大量的生物医学文献增加,传统上通过人工编辑文章集合来编译 TF 特性描述需要花费大量的时间和精力。因此,开发用于知识提取的自动方法以协助编目至关重要。在这里,我们展示了一种基于自动文本摘要策略的有效方法,用于提取知识以协助编目描述细菌 TF 特性的摘要。通过处理 5961 篇科学文章,我们能够自动恢复手动摘要中描述的 177 个大肠杆菌 K-12 TF 特性的知识的中位数 77%。对于 71%的 TF,我们的方法提取了新的知识,可以用于扩展手动描述。此外,由于我们使用大肠杆菌的手动摘要来训练我们的预测模型,我们还从 3498 篇文章中为沙门氏菌血清型鼠伤寒菌的 185 个 TF 生成了摘要。根据对这 10 个鼠伤寒菌摘要的手动编目,其中 96%的句子包含相关知识。我们的结果证明了可以协助手动编目,使用自动提取的新知识自动扩展手动摘要,并为这些编目工作不存在的细菌创建新的摘要。数据库网址:大肠杆菌和沙门氏菌 TF 的自动摘要和自动摘要器可在 GitHub 上获得 (https://github.com/laigen-unam/tf-properties-summarizer.git)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6d3/7731926/36ae4d5a95de/baaa109f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验