• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

QTLTableMiner:科学文章中QTL表格的语义挖掘

QTLTableMiner: semantic mining of QTL tables in scientific articles.

作者信息

Singh Gurnoor, Kuzniar Arnold, van Mulligen Erik M, Gavai Anand, Bachem Christian W, Visser Richard G F, Finkers Richard

机构信息

Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands.

Netherlands eScience Center (NLeSC), Amsterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2018 May 25;19(1):183. doi: 10.1186/s12859-018-2165-7.

DOI:10.1186/s12859-018-2165-7
PMID:29801439
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5970438/
Abstract

BACKGROUND

A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file.

RESULTS

The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall.

CONCLUSION

QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.

摘要

背景

数量性状基因座(QTL)是与表型相关的基因组区域。关于QTL定位研究的大多数实验信息都在科学出版物的表格中描述。传统的文本挖掘技术旨在从非结构化文本而不是表格中提取信息。我们提出了QTLTableMiner(QTM),这是一种表格挖掘工具,可提取并语义注释隐藏在植物科学文献(异构)表格中的QTL信息。QTM是一个用Java编程语言编写的命令行工具。该工具以欧洲PMC知识库中的科学文章为输入,使用关键字匹配和基于本体的概念识别来提取QTL表格。这些表格会使用从表格属性(如图注、列标题和表格页脚)派生的规则进一步规范化。此外,根据列标题和单元格条目的数据类型,将表格列分为三类,即列描述符、属性和值。使用施瓦茨和赫斯特算法扩展表格中发现的缩写。最后,使用Apache Solr搜索平台,用特定领域的本体(如作物本体、植物本体和性状本体)对QTL表格的内容进行语义丰富,并将结果存储在关系数据库和文本文件中。

结果

基于从两个开放获取文章的人工注释语料库(即番茄(Solanum lycopersicum)和马铃薯(S. tuberosum)中的QTL定位研究)检索到的信息,通过精确率和召回率对QTM工具的性能进行了评估。总之,QTM在番茄中检测到QTL陈述的精确率为74.53%,召回率为92.56%;在马铃薯中检测到QTL陈述的精确率为82.82%,召回率为98.94%。

结论

QTM是一个独特的工具,有助于以机器可读和语义可互操作的格式提供QTL信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/08ef89a3ff78/12859_2018_2165_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/be4f78931a09/12859_2018_2165_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/160ddb2f5b50/12859_2018_2165_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/880eb0207d37/12859_2018_2165_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/c66e02115400/12859_2018_2165_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/d30cf0409504/12859_2018_2165_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/08ef89a3ff78/12859_2018_2165_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/be4f78931a09/12859_2018_2165_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/160ddb2f5b50/12859_2018_2165_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/880eb0207d37/12859_2018_2165_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/c66e02115400/12859_2018_2165_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/d30cf0409504/12859_2018_2165_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ea2/5970438/08ef89a3ff78/12859_2018_2165_Fig6_HTML.jpg

相似文献

1
QTLTableMiner: semantic mining of QTL tables in scientific articles.QTLTableMiner:科学文章中QTL表格的语义挖掘
BMC Bioinformatics. 2018 May 25;19(1):183. doi: 10.1186/s12859-018-2165-7.
2
QTLMiner: QTL database curation by mining tables in literature.QTLMiner:通过挖掘文献中的表格来进行 QTL 数据库编修。
Bioinformatics. 2015 May 15;31(10):1689-91. doi: 10.1093/bioinformatics/btv016. Epub 2015 Jan 12.
3
Linkage relationships among multiple QTL for horticultural traits and late blight (P. infestans) resistance on chromosome 5 introgressed from wild tomato Solanum habrochaites.从野生番茄 Solanum habrochaites 导入的 5 号染色体上多个园艺性状和晚疫病(晚疫病菌)抗性的 QTL 之间的连锁关系。
G3 (Bethesda). 2013 Dec 9;3(12):2131-46. doi: 10.1534/g3.113.007195.
4
TaeC: A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature.TaeC:一个用于小麦育种文献中性状和表型提取以及实体链接的人工注释文本数据集。
PLoS One. 2024 Jun 13;19(6):e0305475. doi: 10.1371/journal.pone.0305475. eCollection 2024.
5
Wide-genome QTL mapping of fruit quality traits in a tomato RIL population derived from the wild-relative species Solanum pimpinellifolium L.利用源自野生近缘种 S. pimpinellifolium L. 的番茄 RIL 群体进行全基因组 QTL 定位分析果实品质性状
Theor Appl Genet. 2015 Oct;128(10):2019-35. doi: 10.1007/s00122-015-2563-4. Epub 2015 Jul 12.
6
Detection of Quantitative Trait Loci (QTL) Associated with the Fruit Morphology of Tomato.番茄果实形态数量性状位点(QTL)的检测。
Genes (Basel). 2020 Sep 24;11(10):1117. doi: 10.3390/genes11101117.
7
OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.生物标记器:在生物医学文献中检测、规范和定位生物实体。
Bioinformatics. 2011 Oct 1;27(19):2721-9. doi: 10.1093/bioinformatics/btr452. Epub 2011 Aug 9.
8
Mapping and QTL Analysis of Early-Maturity Traits in Tetraploid Potato ( L.).四倍体马铃薯早熟性状的定位和 QTL 分析。
Int J Mol Sci. 2018 Oct 8;19(10):3065. doi: 10.3390/ijms19103065.
9
Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain.基于链接生物医学本体的疾病-药物领域自动化本体生成框架。
Comput Methods Programs Biomed. 2018 Oct;165:117-128. doi: 10.1016/j.cmpb.2018.08.010. Epub 2018 Aug 16.
10
Textpresso: an ontology-based information retrieval and extraction system for biological literature.Textpresso:一个基于本体的生物文献信息检索与提取系统。
PLoS Biol. 2004 Nov;2(11):e309. doi: 10.1371/journal.pbio.0020309. Epub 2004 Sep 21.

引用本文的文献

1
Automatic classification of literature in systematic reviews on food safety using machine learning.利用机器学习对食品安全系统评价中的文献进行自动分类。
Curr Res Food Sci. 2021 Dec 26;5:84-95. doi: 10.1016/j.crfs.2021.12.010. eCollection 2022.
2
Mining news media for understanding public health concerns.挖掘新闻媒体以了解公众健康问题。
J Clin Transl Sci. 2019 Oct 23;5(1):e1. doi: 10.1017/cts.2019.434.
3
Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait.从植物科学文献中提取知识网络:以马铃薯块茎颜色为例证特征。

本文引用的文献

1
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
2
QTLMiner: QTL database curation by mining tables in literature.QTLMiner:通过挖掘文献中的表格来进行 QTL 数据库编修。
Bioinformatics. 2015 May 15;31(10):1689-91. doi: 10.1093/bioinformatics/btv016. Epub 2015 Jan 12.
3
The Sol Genomics Network (SGN)--from genotype to phenotype to breeding.索尔基因组学网络(SGN)——从基因型到表型再到育种。
BMC Plant Biol. 2021 Apr 24;21(1):198. doi: 10.1186/s12870-021-02943-5.
4
The Sorghum QTL Atlas: a powerful tool for trait dissection, comparative genomics and crop improvement.高粱数量性状位点图谱:性状剖析、比较基因组学和作物改良的有力工具。
Theor Appl Genet. 2019 Mar;132(3):751-766. doi: 10.1007/s00122-018-3212-5. Epub 2018 Oct 20.
Nucleic Acids Res. 2015 Jan;43(Database issue):D1036-41. doi: 10.1093/nar/gku1195. Epub 2014 Nov 26.
4
Europe PMC: a full-text literature database for the life sciences and platform for innovation.欧洲生物医学与健康科学电子图书馆(Europe PMC):一个生命科学领域的全文文献数据库及创新平台。
Nucleic Acids Res. 2015 Jan;43(Database issue):D1042-8. doi: 10.1093/nar/gku1061. Epub 2014 Nov 6.
5
The plant ontology as a tool for comparative plant anatomy and genomic analyses.植物本体作为一种用于比较植物解剖学和基因组分析的工具。
Plant Cell Physiol. 2013 Feb;54(2):e1. doi: 10.1093/pcp/pcs163. Epub 2012 Dec 5.
6
The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013.《ChEBI 参考数据库和生物学相关化学本体:2013 年的增强》
Nucleic Acids Res. 2013 Jan;41(Database issue):D456-63. doi: 10.1093/nar/gks1146. Epub 2012 Nov 24.
7
Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the Crop Ontology developed by the crop communities of practice.通过使用由作物实践社区开发的作物本体进行数据注释,将有助于综合育种的表型数据和遗传数据联系起来。
Front Physiol. 2012 Aug 25;3:326. doi: 10.3389/fphys.2012.00326. eCollection 2012.
8
Ontologies as integrative tools for plant science.本体论作为植物科学的综合工具。
Am J Bot. 2012 Aug;99(8):1263-75. doi: 10.3732/ajb.1200222. Epub 2012 Jul 30.
9
solQTL: a tool for QTL analysis, visualization and linking to genomes at SGN database.solQTL:一个用于 QTL 分析、可视化和链接到 SGN 数据库基因组的工具。
BMC Bioinformatics. 2010 Oct 21;11:525. doi: 10.1186/1471-2105-11-525.
10
Gramene QTL database: development, content and applications.Gramene数量性状基因座数据库:开发、内容与应用
Database (Oxford). 2009;2009:bap005. doi: 10.1093/database/bap005. Epub 2009 May 8.