• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于组装合成古生物学数据库的机器阅读系统。

A machine reading system for assembling synthetic paleontological databases.

作者信息

Peters Shanan E, Zhang Ce, Livny Miron, Ré Christopher

机构信息

Department of Geoscience, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

出版信息

PLoS One. 2014 Dec 1;9(12):e113523. doi: 10.1371/journal.pone.0113523. eCollection 2014.

DOI:10.1371/journal.pone.0113523
PMID:25436610
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4250071/
Abstract

Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.

摘要

宏观进化理论的许多方面以及我们对生物对全球环境变化反应的理解都源于基于文献的古生物学数据汇编。然而,现有的手动汇编数据库并不完整,难以用新的数据类型进行评估和扩充。在此,我们开发并验证了一种机器阅读系统PaleoDeepDive的质量,该系统能自动从出版物中的异构文本、表格和图表中定位并提取数据。在几个复杂的数据提取和推理任务中,PaleoDeepDive的表现与人类相当,并生成了一致的综合结果,这些结果描述了分类多样性的地质历史以及属级别的起源和灭绝速率。与传统数据库不同,PaleoDeepDive生成一个概率数据库,随着信息的添加,该数据库会系统地改进。我们表明,该系统能够轻松容纳复杂的数据类型,比如生物插图中的形态学数据以及相关的文本描述。我们用于科学数据整合与综合的机器阅读方法使得许多目前尚未确定的问题变得可以解决,并且能够以可能激发全新探究模式的方式做到这一点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/110b7968f81f/pone.0113523.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/07d2f2ac9aba/pone.0113523.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/b0a9268f0e33/pone.0113523.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/cd148eb1becd/pone.0113523.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/93b615958fb9/pone.0113523.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/4f7476c6640b/pone.0113523.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/110b7968f81f/pone.0113523.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/07d2f2ac9aba/pone.0113523.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/b0a9268f0e33/pone.0113523.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/cd148eb1becd/pone.0113523.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/93b615958fb9/pone.0113523.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/4f7476c6640b/pone.0113523.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/394b/4250071/110b7968f81f/pone.0113523.g006.jpg

相似文献

1
A machine reading system for assembling synthetic paleontological databases.一种用于组装合成古生物学数据库的机器阅读系统。
PLoS One. 2014 Dec 1;9(12):e113523. doi: 10.1371/journal.pone.0113523. eCollection 2014.
2
Ten years in the library: new data confirm paleontological patterns.在图书馆的十年:新数据证实古生物学模式。
Paleobiology. 1993 Winter;19(1):43-51. doi: 10.1017/s0094837300012306.
3
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.文本挖掘在畜牧动物科学中的应用:介绍文本挖掘在动物科学中的应用潜力。
J Anim Sci. 2012 Oct;90(10):3666-76. doi: 10.2527/jas.2011-4841. Epub 2012 Jun 4.
4
Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach.Omnicrobe 是一个使用全面的文本挖掘和数据融合方法的微生物生境和表型的开放获取数据库。
PLoS One. 2023 Jan 20;18(1):e0272473. doi: 10.1371/journal.pone.0272473. eCollection 2023.
5
Data Definition Ontology for clinical data integration and querying.用于临床数据集成与查询的数据定义本体论。
Stud Health Technol Inform. 2012;180:38-42.
6
Automatic detection of adverse events to predict drug label changes using text and data mining techniques.利用文本和数据挖掘技术自动检测不良事件以预测药物标签变化。
Pharmacoepidemiol Drug Saf. 2013 Nov;22(11):1189-94. doi: 10.1002/pds.3493. Epub 2013 Aug 12.
7
iMole, a web based image retrieval system from biomedical literature.iMole,一个基于网络的生物医学文献图像检索系统。
Electrophoresis. 2013 Jul;34(13):1965-8. doi: 10.1002/elps.201300085. Epub 2013 Jun 17.
8
Planetary biology--paleontological, geological, and molecular histories of life.行星生物学——生命的古生物学、地质学和分子史。
Science. 2002 May 3;296(5569):864-8. doi: 10.1126/science.1069863.
9
The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships.欧盟不良反应数据库:标注药物、疾病、靶点及其相互关系。
J Biomed Inform. 2012 Oct;45(5):879-84. doi: 10.1016/j.jbi.2012.04.004. Epub 2012 Apr 25.
10
Combining literature text mining with microarray data: advances for system biology modeling.结合文献文本挖掘和微阵列数据:系统生物学建模的进展。
Brief Bioinform. 2012 Jan;13(1):61-82. doi: 10.1093/bib/bbr018. Epub 2011 Jun 15.

引用本文的文献

1
Challenges and directions in analytical paleobiology.分析古生物学的挑战与方向
Paleobiology. 2023 Aug;49(3):377-393. doi: 10.1017/pab.2023.3. Epub 2023 Feb 27.
2
The Deep-Time Digital Earth program: data-driven discovery in geosciences.深时数字地球计划:地球科学中的数据驱动发现
Natl Sci Rev. 2021 Feb 11;8(9):nwab027. doi: 10.1093/nsr/nwab027. eCollection 2021 Sep.
3
When fossil clades 'compete': local dominance, global diversification dynamics and causation.当化石分支“竞争”时:局部优势、全球多样化动态及其原因。

本文引用的文献

1
Elsevier opens its papers to text-mining.爱思唯尔将其论文开放用于文本挖掘。
Nature. 2014 Feb 6;506(7486):17. doi: 10.1038/506017a.
2
Climate change and the past, present, and future of biotic interactions.气候变化与生物相互作用的过去、现在和未来。
Science. 2013 Aug 2;341(6145):499-504. doi: 10.1126/science.1237184.
3
Climate change and the selective signature of the Late Ordovician mass extinction.气候变化与晚奥陶世大灭绝的选择性特征。
Proc Biol Sci. 2021 Sep 29;288(1959):20211632. doi: 10.1098/rspb.2021.1632. Epub 2021 Sep 22.
4
Text-mined fossil biodiversity dynamics using machine learning.使用机器学习挖掘文本化的化石生物多样性动态
Proc Biol Sci. 2019 Apr 24;286(1901):20190022. doi: 10.1098/rspb.2019.0022.
5
Big data management challenges in health research-a literature review.大数据管理在健康研究中的挑战——文献综述
Brief Bioinform. 2019 Jan 18;20(1):156-167. doi: 10.1093/bib/bbx086.
6
DeepDive: Declarative Knowledge Base Construction.深度探究:声明式知识库构建
SIGMOD Rec. 2016 Mar;45(1):60-67. Epub 2016 Feb 6.
7
Extracting Databases from Dark Data with DeepDive.使用DeepDive从暗数据中提取数据库。
Proc ACM SIGMOD Int Conf Manag Data. 2016 Jun-Jul;2016:847-859. doi: 10.1145/2882903.2904442.
8
Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.基于层次宽度的一类因子图的快速混合吉布斯采样
Adv Neural Inf Process Syst. 2015 Dec;28:3079-3087.
9
Incremental Knowledge Base Construction Using DeepDive.使用DeepDive进行增量知识库构建。
Proceedings VLDB Endowment. 2015 Jul;8(11):1310-1321. doi: 10.14778/2809974.2809991.
10
"Reverse Genomics" Predicts Function of Human Conserved Noncoding Elements.“反向基因组学”预测人类保守非编码元件的功能。
Mol Biol Evol. 2016 May;33(5):1358-69. doi: 10.1093/molbev/msw001. Epub 2016 Jan 6.
Proc Natl Acad Sci U S A. 2012 May 1;109(18):6829-34. doi: 10.1073/pnas.1117039109. Epub 2012 Apr 17.
4
The shifting balance of diversity among major marine animal groups.主要海洋动物类群多样性的变化平衡。
Science. 2010 Sep 3;329(5996):1191-4. doi: 10.1126/science.1189910.
5
Two-phase increase in the maximum size of life over 3.5 billion years reflects biological innovation and environmental opportunity.在超过35亿年的时间里,生命最大尺寸的两阶段增长反映了生物创新和环境机遇。
Proc Natl Acad Sci U S A. 2009 Jan 6;106(1):24-7. doi: 10.1073/pnas.0806314106. Epub 2008 Dec 23.
6
Phanerozoic trends in the global diversity of marine invertebrates.显生宙海洋无脊椎动物全球多样性趋势
Science. 2008 Jul 4;321(5885):97-100. doi: 10.1126/science.1156963.
7
Out of the tropics: evolutionary dynamics of the latitudinal diversity gradient.走出热带地区:纬度多样性梯度的进化动态
Science. 2006 Oct 6;314(5796):102-6. doi: 10.1126/science.1130880.
8
Naming taxa from cladograms: a cautionary tale.从系统发育树命名分类单元:一个警示故事。
Mol Phylogenet Evol. 2007 Feb;42(2):317-30. doi: 10.1016/j.ympev.2006.06.007. Epub 2006 Jun 17.
9
Ancestral state reconstruction of body size in the Caniformia (Carnivora, Mammalia): the effects of incorporating data from the fossil record.犬型亚目(食肉目,哺乳纲)体型的祖先状态重建:纳入化石记录数据的影响
Syst Biol. 2006 Apr;55(2):301-13. doi: 10.1080/10635150500541698.
10
Long-term relationships between ecological stability and biodiversity in Phanerozoic reefs.显生宙珊瑚礁生态稳定性与生物多样性之间的长期关系。
Nature. 2005 Jan 27;433(7024):410-3. doi: 10.1038/nature03152.