Suppr超能文献

利用无监督学习和正例无标签学习促进信息提取,而无需使用标注数据。

Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning.

机构信息

Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA.

Harvard Medical School, Boston, MA.

出版信息

AMIA Annu Symp Proc. 2021 Jan 25;2020:658-667. eCollection 2020.

Abstract

Information extraction (IE), the distillation of specific information from unstructured data, is a core task in natural language processing. For rare entities (<1% prevalence), collection of positive examples required to train a model may require an infeasibly large sample of mostly negative ones. We combined unsupervised- with biased positive-unlabeled (PU) learning methods to: 1) facilitate positive example collection while maintaining the assumptions needed to 2) learn a binary classifier from the biased positive-unlabeled data alone. We tested the methods on a real-life use case of rare (<0.42%) entity extraction from medical malpractice documents. When tested on a manually reviewed random sample of documents, the PU model achieved an area under the precision-recall curve of0.283 and Fj of 0.410, outperforming fully supervised learning (0.022 and 0.096, respectively). The results demonstrate our method's potential to reduce the manual effort required for extracting rare entities from narrative texts.

摘要

信息抽取(IE),即从非结构化数据中提取特定信息,是自然语言处理的核心任务。对于罕见实体(<1%的患病率),为了训练模型而需要收集的阳性示例可能需要大量的主要为阴性的示例。我们结合了无监督和有偏的阳性未标记(PU)学习方法来:1)促进阳性示例的收集,同时保持从有偏的阳性未标记数据中学习二分类器所需的假设。我们在一个真实的罕见(<0.42%)实体提取的医疗事故文档的用例中测试了这些方法。在对人工审阅的随机文档样本进行测试时,PU 模型在精度-召回曲线下的面积达到 0.283,Fj 值达到 0.410,优于完全监督学习(分别为 0.022 和 0.096)。结果表明,我们的方法有潜力减少从叙述性文本中提取罕见实体所需的人工工作量。

相似文献

5
Entity linking for biomedical literature.生物医学文献的实体链接
BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1472-6947-15-S1-S4. Epub 2015 May 20.
10
Extracting biomedical events from pairs of text entities.从文本实体对中提取生物医学事件。
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S8. doi: 10.1186/1471-2105-16-S10-S8. Epub 2015 Jul 13.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验