生物多样性信息学：数据链接的挑战与共享标识符的作用

Biodiversity informatics: the challenge of linking data and the role of shared identifiers.

作者信息

Page Roderic D M

机构信息

Division of Environmental and Evolutional Biology, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK.

出版信息

Brief Bioinform. 2008 Sep;9(5):345-54. doi: 10.1093/bib/bbn022. Epub 2008 Apr 29.

DOI:10.1093/bib/bbn022

PMID:18445641

Abstract

A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.

摘要

生物多样性信息学面临的一个主要挑战是整合存储在广泛分布的数据库中的数据。最初的努力依赖于分类学名称作为链接不同数据库中记录的共享标识符。然而，分类学名称作为标识符存在局限性，既不稳定也不具有全球唯一性，而且分子分类学和系统发育研究的速度意味着公共序列数据库中的许多信息并未与正式的分类学名称相关联。本综述探讨了使用其他标识符，如标本代码和GenBank登录号，来链接不同数据库中原本不相关的事实。这些链接的结构还可以利用PageRank算法对生物多样性数据库搜索结果进行排名。丰富整合的关键在于致力于部署和重用全球唯一的共享标识符[如数字对象标识符（DOI）和生命科学标识符（LSID）]，以及实施链接这些标识符的服务。

相似文献

Biodiversity informatics: the challenge of linking data and the role of shared identifiers.生物多样性信息学：数据链接的挑战与共享标识符的作用

Brief Bioinform. 2008 Sep;9(5):345-54. doi: 10.1093/bib/bbn022. Epub 2008 Apr 29.

Schema driven assignment and implementation of life science identifiers (LSIDs).模式驱动的生命科学标识符（LSID）的分配与实施。

J Biomed Inform. 2008 Oct;41(5):730-8. doi: 10.1016/j.jbi.2008.05.014. Epub 2008 Jun 13.

A Taxonomic Search Engine: federating taxonomic databases using web services.一个分类搜索引擎：使用网络服务联合分类数据库。

BMC Bioinformatics. 2005 Mar 9;6:48. doi: 10.1186/1471-2105-6-48.

Semantic Mining in Biomedicine (Introduction to the papers selected from the SMBM 2005 Symposium, Hinxton, U.K., April 2005).生物医学中的语义挖掘（选自2005年4月于英国欣克斯顿举行的SMBM 2005研讨会的论文介绍）

Bioinformatics. 2006 Mar 15;22(6):643-4. doi: 10.1093/bioinformatics/btl084.

Formal ontology for natural language processing and the integration of biomedical databases.用于自然语言处理和生物医学数据库整合的形式本体论。

Int J Med Inform. 2006 Mar-Apr;75(3-4):224-31. doi: 10.1016/j.ijmedinf.2005.07.015. Epub 2005 Sep 8.

GeneInfoMiner--a web server for exploring biomedical literature using batch sequence ID.基因信息挖掘器——一个使用批量序列ID探索生物医学文献的网络服务器。

Bioinformatics. 2005 Aug 15;21(16):3452-3. doi: 10.1093/bioinformatics/bti559. Epub 2005 Jun 30.

A protocol for the update of references to scientific literature in biological databases.生物数据库中科学文献参考文献更新协议。

Appl Bioinformatics. 2003;2(3):189-91.

Web servicing the biological office.网络服务生物办公室。

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii268-9. doi: 10.1093/bioinformatics/bti1144.

A database of unique protein sequence identifiers for proteome studies.用于蛋白质组研究的独特蛋白质序列标识符数据库。

Proteomics. 2006 Aug;6(16):4514-22. doi: 10.1002/pmic.200600032.

The need for a biological registration system.对生物注册系统的需求。

IDrugs. 2010 Jun;13(6):388-93.

引用本文的文献

An open and continuously updated fern tree of life.一棵开放且不断更新的蕨类生命之树。

Front Plant Sci. 2022 Aug 24;13:909768. doi: 10.3389/fpls.2022.909768. eCollection 2022.

Match Algorithms for Scientific Names in FlorItaly, the Portal to the Flora of Italy.意大利植物区系门户网站FlorItaly中科学名称的匹配算法。

Plants (Basel). 2021 May 13;10(5):974. doi: 10.3390/plants10050974.

20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration.十分钟内传输20GB：利用开放的社会技术基础设施和务实的跨机构合作连接主要生物多样性数据库的实例

PeerJ Comput Sci. 2018 Sep 17;4:e164. doi: 10.7717/peerj-cs.164. eCollection 2018.

Mining data from legacy taxonomic literature and application for sampling spiders of the Teutamus group (Araneae; Liocranidae) in Southeast Asia.从分类学文献中挖掘数据并应用于东南亚 Teutamus 组（蜘蛛目；Liocranidae）蜘蛛的采样。

Sci Rep. 2020 Sep 25;10(1):15787. doi: 10.1038/s41598-020-72549-8.

The Natural History Museum Data Portal.自然历史博物馆数据门户。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz038.

Biodiversity data integration-the significance of data resolution and domain.生物多样性数据集成——数据分辨率和领域的重要性。

PLoS Biol. 2019 Mar 18;17(3):e3000183. doi: 10.1371/journal.pbio.3000183. eCollection 2019 Mar.

Use of globally unique identifiers (GUIDs) to link herbarium specimen records to physical specimens.使用全球唯一标识符（GUID）将植物标本记录与实体标本相链接。

Appl Plant Sci. 2018 Mar 7;6(2):e1027. doi: 10.1002/aps3.1027. eCollection 2018 Feb.

To increase trust, change the social design behind aggregated biodiversity data.为了增加信任，改变聚合生物多样性数据背后的社会设计。

Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bax100.

Automated assembly of a reference taxonomy for phylogenetic data synthesis.用于系统发育数据综合的参考分类法的自动组装。

Biodivers Data J. 2017 May 22(5):e12581. doi: 10.3897/BDJ.5.e12581. eCollection 2017.

The importance of digitized biocollections as a source of trait data and a new VertNet resource.数字化生物样本库作为性状数据来源和VertNet新资源的重要性。

Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw158. Print 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物多样性信息学：数据链接的挑战与共享标识符的作用

Biodiversity informatics: the challenge of linking data and the role of shared identifiers.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献