Stoltzfus Arlin, O'Meara Brian, Whitacre Jamie, Mounce Ross, Gillespie Emily L, Kumar Sudhir, Rosauer Dan F, Vos Rutger A
Biochemical Science Division, NIST, Gaithersburg, MD, USA.
BMC Res Notes. 2012 Oct 22;5:574. doi: 10.1186/1756-0500-5-574.
Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use.
Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree.
The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating which kinds of data and metadata are most important for a re-useable phylogenetic record.
最近,各种与进化相关的期刊都采取了政策,鼓励或要求存档系统发育树及相关数据。对促进数据共享的实践给予如此关注,反映了信息技术的迅速进步,以及利用该技术汇总和链接先前发表研究数据的潜力迅速扩大。然而,对于发布系统发育树及相关数据以促进再利用的当前实践或最佳实践,我们却知之甚少。
在此,我们总结了一项正在进行的分析结果,该分析涉及系统发育树及相关数据的存档当前实践、再利用的当前实践以及再利用的当前障碍。我们发现,技术基础设施已具备支持基本存档的能力,但存档频率较低。目前,由于缺乏存档、对最佳实践缺乏认识以及缺乏全社区统一的数据格式化、实体命名和数据注释标准,大多数系统发育知识难以被再利用。大多数数据再利用的尝试似乎都以失望告终。不过,我们也发现了许多数据再利用的积极例子,特别是那些涉及通过嫁接到更大的树并从中修剪而生成的定制物种树的例子。
促进数据再利用的技术和实践能够推动综合研究。然而,要取得成功需要包括生成或使用可共享数据的个体科学家、出版商、政策制定者、技术开发者和资源提供者在内的各方利益相关者的参与。我们认为,促进系统发育树及相关数据再利用面临的关键挑战包括:更广泛地致力于公共存档;更广泛地使用具有全球意义的标识符;开发便于用户注释、提交、搜索和检索数据及其元数据的技术;以及制定最低报告标准(MIAPA),指明对于可再利用的系统发育记录而言哪些类型的数据和元数据最为重要。