Heart + Lung Institute at St, Paul's Hospital, University of British Columbia, Vancouver, Canada.
BMC Bioinformatics. 2009 Sep 25;10:313. doi: 10.1186/1471-2105-10-313.
Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products.
We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting.
The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in the life sciences, more documents need to be tagged and more tags are needed for each document. These issues may be addressed both by finding ways to attract more users to current systems and by creating new user interfaces that encourage more collectively useful individual tagging behaviour.
学术社会标签系统,如 Connotea 和 CiteULike,为研究人员提供了一种用关键词(标签)组织个人在线参考文献收藏并与他人分享这些收藏的方法。这些系统运作的一个副作用是生成了大型的、可公开访问的元数据存储库,描述了收藏中的资源。鉴于生命科学领域信息的广泛扩展以及元数据对其增值的需求,这些存储库为应用程序开发者提供了一个潜在的有价值的新资源。在这里,我们描述了通过社会标签创建的两个与科学相关的元数据存储库的当前内容。这项研究有助于确定目前如何使用这种由社会构建的元数据,并提出新的社会标签系统设计思路,以产生更好的聚合产品。
我们使用以下指标评估了 CiteULike 和 Connotea 用户与 PubMed 中的引文相关联的元数据:文档空间的覆盖率、每篇文档的元数据(标签)密度、注释者之间的一致性率以及与 MeSH 索引的一致性率。在所有测量方面,CiteULike 和 Connotea 非常相似。与社会标签系统相比,PubMed 的文档覆盖率和每篇文档的元数据密度要低得多。社会标签系统中的注释者之间的一致性以及聚合的社会标签元数据与 MeSH 索引之间的一致性较低,但通过投票可以提高后者的一致性。
从当前学术社会标签存储库中获取元数据的最有前途的用途将是那些能够利用这些系统中用户、标签和文档之间的新颖关系的用途。对于生命科学中更传统的基于索引的应用程序(如基于关键字的搜索)来说,要从社会生成的元数据中获得实质性的益处,需要标记更多的文档,并且每个文档需要更多的标签。这些问题可以通过寻找吸引更多用户加入现有系统的方法以及创建新的用户界面来解决,这些界面鼓励更具集体有用性的个人标记行为。