• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将机器学习与语言特征相结合:一种用于中文文本中时间表达式提取与规范化的通用方法。

Integrating machine learning with linguistic features: A universal method for extraction and normalization of temporal expressions in Chinese texts.

作者信息

Wang Shunli, Li Rui, Wu Huayi

机构信息

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China; Hubei Luojia Laboratory, Wuhan, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China.

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China; Hubei Luojia Laboratory, Wuhan, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China.

出版信息

Comput Methods Programs Biomed. 2023 May;233:107474. doi: 10.1016/j.cmpb.2023.107474. Epub 2023 Mar 11.

DOI:10.1016/j.cmpb.2023.107474
PMID:36931017
Abstract

BACKGROUND AND OBJECTIVE

With the rapid development of information dissemination technology, the amount of events information contained in massive texts now far exceeds the intuitive cognition of humans, and it is hard to understand the progress of events in order of time. Temporal information runs through the whole process of beginning, proceeding, and ending of events, and plays an important role in many natural language processing applications, such as information extraction, question answering, and text summary. Accurately extracting temporal information from Chinese texts and automatically mapping the temporal expressions in natural language to the time axis are crucial to understanding the development of events and dynamic changes in them.

METHODS

This study proposes a method integrating machine learning with linguistic features (IMLLF) for extraction and normalization of temporal expressions in Chinese texts to achieve the above objectives. Linguistic features are constructed by analyzing the expression rules of temporal information, and are combined with machine learning to map the natural language form of time onto a one-dimensional timeline. The web text dataset we build is divided into five parts for five-fold cross-validation, to compare the influence of different combinations of linguistic features and different methods. In the open medical dialog dataset, based on the training model obtained from the web text dataset, 200 disease descriptions are randomly selected each time for three rounds of experiments.

RESULTS

The F1 of multi-feature fusion is 95.2%, which is better than the single-feature and double-feature combination. The results of experiments showed that the proposed IMLLF method can improve the accuracy of recognition of temporal information in Chinese to a greater extent than classical methods, with an F1-score of over 95% on the web text dataset and medical conversation dataset. In terms of the normalization of time expressions, the accuracy of the IMLLF method is higher than 93%.

CONCLUSIONS

IMLLF has better results in extracting and normalizing time expressions on the web text dataset and the medical conversation dataset, which verifies the universality of IMLLF to identify and quantify temporal information. IMLLF method can accurately map the time information to the time axis, which is convenient for doctors to intuitively see when and what happened to the patient, and helps to make better medical decisions.

摘要

背景与目的

随着信息传播技术的飞速发展,海量文本中包含的事件信息量如今已远远超出人类的直观认知,难以按时间顺序理解事件的进展。时间信息贯穿事件从开始、进行到结束的全过程,在许多自然语言处理应用中发挥着重要作用,如信息抽取、问答和文本摘要。从中文文本中准确提取时间信息并将自然语言中的时间表达式自动映射到时间轴上,对于理解事件的发展及其动态变化至关重要。

方法

本研究提出一种将机器学习与语言特征相结合的方法(IMLLF),用于中文文本中时间表达式的提取和规范化,以实现上述目标。通过分析时间信息的表达规则构建语言特征,并与机器学习相结合,将时间的自然语言形式映射到一维时间轴上。我们构建的网络文本数据集分为五个部分进行五折交叉验证,以比较不同语言特征组合和不同方法的影响。在开放医学对话数据集中,基于从网络文本数据集获得的训练模型,每次随机选择200个疾病描述进行三轮实验。

结果

多特征融合的F1值为95.2%,优于单特征和双特征组合。实验结果表明,所提出的IMLLF方法比经典方法能在更大程度上提高中文时间信息识别的准确性,在网络文本数据集和医学对话数据集上的F1分数超过95%。在时间表达式规范化方面,IMLLF方法的准确率高于93%。

结论

IMLLF在网络文本数据集和医学对话数据集上的时间表达式提取和规范化方面有较好的结果,验证了IMLLF识别和量化时间信息的通用性。IMLLF方法能将时间信息准确映射到时间轴上,便于医生直观地了解患者何时发生了什么情况,有助于做出更好的医疗决策。

相似文献

1
Integrating machine learning with linguistic features: A universal method for extraction and normalization of temporal expressions in Chinese texts.将机器学习与语言特征相结合:一种用于中文文本中时间表达式提取与规范化的通用方法。
Comput Methods Programs Biomed. 2023 May;233:107474. doi: 10.1016/j.cmpb.2023.107474. Epub 2023 Mar 11.
2
Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach.基于中文叙事临床文本的时间表达分类与归一化:模式学习方法
JMIR Med Inform. 2020 Jul 27;8(7):e17652. doi: 10.2196/17652.
3
Extraction of Temporal Structures for Clinical Events in Unlabeled Free-Text Electronic Health Records in Russian.在俄语的无标注电子健康记录中提取临床事件的时间结构。
Stud Health Technol Inform. 2021 Nov 18;287:55-56. doi: 10.3233/SHTI210811.
4
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
5
Extracting temporal information from electronic patient records.从电子病历中提取时间信息。
AMIA Annu Symp Proc. 2012;2012:542-51. Epub 2012 Nov 3.
6
Comparison of different feature extraction methods for applicable automated ICD coding.不同特征提取方法在适用的自动化 ICD 编码中的比较。
BMC Med Inform Decis Mak. 2022 Jan 12;22(1):11. doi: 10.1186/s12911-022-01753-5.
7
MedTime: a temporal information extraction system for clinical narratives.MedTime:一个用于临床叙述的时间信息提取系统。
J Biomed Inform. 2013 Dec;46 Suppl:S20-S28. doi: 10.1016/j.jbi.2013.07.012. Epub 2013 Jul 31.
8
Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives.从临床叙述中提取时间表达式和事件的规则与机器学习相结合。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):859-66. doi: 10.1136/amiajnl-2013-001625. Epub 2013 Apr 20.
9
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
10
[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

引用本文的文献

1
DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields.面向专业领域小样本的DAT-MT加速图融合依存句法分析模型
Entropy (Basel). 2023 Oct 12;25(10):1444. doi: 10.3390/e25101444.