• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于参考解析的结构 SVM 方法。

A structural SVM approach for reference parsing.

机构信息

Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2105-12-S3-S7.

DOI:10.1186/1471-2105-12-S3-S7
PMID:21658294
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3111593/
Abstract

BACKGROUND

Automated extraction of bibliographic data, such as article titles, author names, abstracts, and references is essential to the affordable creation of large citation databases. References, typically appearing at the end of journal articles, can also provide valuable information for extracting other bibliographic data. Therefore, parsing individual reference to extract author, title, journal, year, etc. is sometimes a necessary preprocessing step in building citation-indexing systems. The regular structure in references enables us to consider reference parsing a sequence learning problem and to study structural Support Vector Machine (structural SVM), a newly developed structured learning algorithm on parsing references.

RESULTS

In this study, we implemented structural SVM and used two types of contextual features to compare structural SVM with conventional SVM. Both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. We also compared SVM and structural SVM to Conditional Random Field (CRF). The experimental results show that structural SVM and CRF achieve similar accuracies at token- and chunk-levels.

CONCLUSIONS

When only basic observation features are used for each token, structural SVM achieves higher performance compared to SVM since it utilizes the contextual label features. However, when the contextual observation features from neighboring tokens are combined, SVM performance improves greatly, and is close to that of structural SVM after adding the second order contextual observation features. The comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing.

摘要

背景

自动化提取书目数据(如文章标题、作者姓名、摘要和参考文献)对于创建可负担得起的大型引文数据库至关重要。参考文献通常出现在期刊文章的末尾,也可以为提取其他书目数据提供有价值的信息。因此,解析单个参考文献以提取作者、标题、期刊、年份等信息有时是构建引文索引系统的必要预处理步骤。参考文献中的规则结构使我们能够将参考文献解析视为序列学习问题,并研究结构支持向量机(structural SVM),这是一种新开发的用于解析参考文献的结构化学习算法。

结果

在这项研究中,我们实现了结构 SVM,并使用了两种类型的上下文特征来比较结构 SVM 与传统 SVM。这两种方法在参考文献解析方面均实现了超过 98%的标记分类准确率和超过 95%的整体词块级准确率。我们还将 SVM 和结构 SVM 与条件随机场(CRF)进行了比较。实验结果表明,在标记级和词块级,结构 SVM 和 CRF 的准确率相当。

结论

当仅对每个标记使用基本观察特征时,结构 SVM 比 SVM 性能更高,因为它利用了上下文标签特征。但是,当结合来自相邻标记的上下文观察特征时,SVM 的性能会大大提高,并且在添加二阶上下文观察特征后,与结构 SVM 的性能非常接近。使用相同的二进制特征集对这两种方法与 CRF 的比较表明,结构 SVM 和 CRF 的性能均优于 SVM,这表明它们在参考文献解析中具有更强的序列学习能力。

相似文献

1
A structural SVM approach for reference parsing.一种用于参考解析的结构 SVM 方法。
BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2105-12-S3-S7.
2
Locating and parsing bibliographic references in HTML medical articles.在HTML格式的医学文章中查找和解析参考文献
Int J Doc Anal Recognit. 2010 Jun 1;13(2):107-119. doi: 10.1007/s10032-009-0105-9.
3
A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction.一种序列标注方法,用于链接临床记录和临床试验公告中的药物及其属性,以进行信息提取。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):915-21. doi: 10.1136/amiajnl-2012-001487. Epub 2012 Dec 25.
4
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
5
Data classification with radial basis function networks based on a novel kernel density estimation algorithm.基于一种新型核密度估计算法的径向基函数网络数据分类
IEEE Trans Neural Netw. 2005 Jan;16(1):225-36. doi: 10.1109/TNN.2004.836229.
6
Semisupervised least squares support vector machine.半监督最小二乘支持向量机
IEEE Trans Neural Netw. 2009 Dec;20(12):1858-70. doi: 10.1109/TNN.2009.2031143.
7
Cancer survival classification using integrated data sets and intermediate information.基于整合数据集和中间信息的癌症生存分类。
Artif Intell Med. 2014 Sep;62(1):23-31. doi: 10.1016/j.artmed.2014.06.003. Epub 2014 Jun 21.
8
Comparison of character-level and part of speech features for name recognition in biomedical texts.生物医学文本中用于名称识别的字符级特征与词性特征比较。
J Biomed Inform. 2004 Dec;37(6):423-35. doi: 10.1016/j.jbi.2004.08.008.
9
Binary tree of SVM: a new fast multiclass training and classification algorithm.支持向量机的二叉树:一种新的快速多类训练与分类算法。
IEEE Trans Neural Netw. 2006 May;17(3):696-704. doi: 10.1109/TNN.2006.872343.
10
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

引用本文的文献

1
Building an annotated corpus for automatic metadata extraction from multilingual journal article references.构建带注释语料库以自动提取多语言期刊文章参考文献元数据。
PLoS One. 2023 Jan 20;18(1):e0280637. doi: 10.1371/journal.pone.0280637. eCollection 2023.
2
PageRank as a method to rank biomedical literature by importance.PageRank作为一种根据重要性对生物医学文献进行排名的方法。
Source Code Biol Med. 2015 Dec 9;10:16. doi: 10.1186/s13029-015-0046-2. eCollection 2015.
3
Topics in machine learning for biomedical literature analysis and text retrieval.用于生物医学文献分析和文本检索的机器学习主题。
BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):I1. doi: 10.1186/1471-2105-12-S3-I1.

本文引用的文献

1
Locating and parsing bibliographic references in HTML medical articles.在HTML格式的医学文章中查找和解析参考文献
Int J Doc Anal Recognit. 2010 Jun 1;13(2):107-119. doi: 10.1007/s10032-009-0105-9.
2
The NLM Indexing Initiative.美国国立医学图书馆索引倡议
Proc AMIA Symp. 2000:17-21.