• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RegEMR:一个自然语言处理系统,用于从中文电子病历中自动识别卵巢早衰。

RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records.

机构信息

Center for Reproductive Medicine, Department of Gynecology and Obstetrics, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China.

出版信息

BMC Med Inform Decis Mak. 2023 Jul 18;23(1):126. doi: 10.1186/s12911-023-02239-8.

DOI:10.1186/s12911-023-02239-8
PMID:37464410
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10353087/
Abstract

BACKGROUND

The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital's electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload.

METHODS

We presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator +  + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation.

RESULTS

The overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518-0.9884) and ultrasonographic measures (F-score 0.9472-0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63.

CONCLUSION

A Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers.

摘要

背景

卵巢储备是生殖潜能的储备库。在临床实践中,早期发现和治疗以卵巢储备试验异常为特征的卵巢早衰被认为是预防不孕的关键措施。然而,相关数据通常以医院电子病历(EMR)系统中的非结构化格式存储,需要由领域专家进行繁琐的手动提取。因此,需要计算工具来减少工作量。

方法

我们提出了 RegEMR,这是一个由基于规则的自然语言处理(NLP)提取器和基于知识的疾病评分模型组成的人工智能工具,用于使用中文生殖 EMR 自动化卵巢早衰的筛查程序。我们使用正则表达式(RE)作为文本挖掘方法,并探索了基于遗传编程的在线平台 RegexGenerator++自动合成的 RE 是否与手动制定的 RE 一样有效。我们还研究了学习语料库的代表性如何影响机器生成的 RE 的性能。此外,我们将临床诊断标准翻译成可编程疾病诊断模型,用于疾病评分和风险分层。从一家中国生育中心收集了 400 份门诊病历。手动审查作为金标准,采用五重交叉验证进行评估。

结果

手动构建的 RE 的总体 F 分数为 0.9444(95%CI 0.9373 至 0.9515),与机器生成的 RE 没有显著差异(配对 t 检验 p>0.05),机器生成的 RE 可能受训练集大小和注释部分的影响。提取器在自动跟踪激素水平的动态变化(F 分数 0.9518-0.9884)和超声测量方面表现出色(F 分数 0.9472-0.9822)。将提取的信息应用于提出的诊断模型,该程序在风险筛查中获得了 0.98 的准确率和 0.93 的敏感性。对于每种特定疾病,76%的患者的自动诊断与临床诊断一致,kappa 系数为 0.63。

结论

开发了一种名为 RegEMR 的中文 NLP 系统,用于从中文生殖 EMR 中自动识别早期卵巢老化的高风险并诊断相关疾病。我们希望该系统能够为生育中心的基于 EMR 的数据收集和临床决策支持提供帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/9531f6525c45/12911_2023_2239_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/19a636827fe3/12911_2023_2239_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/80a22e1ca57a/12911_2023_2239_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/4451033fc7ed/12911_2023_2239_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/fd85634a40f1/12911_2023_2239_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/6160aef3f4b8/12911_2023_2239_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/f302d2c7278f/12911_2023_2239_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/5a2083cc48a6/12911_2023_2239_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/9531f6525c45/12911_2023_2239_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/19a636827fe3/12911_2023_2239_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/80a22e1ca57a/12911_2023_2239_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/4451033fc7ed/12911_2023_2239_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/fd85634a40f1/12911_2023_2239_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/6160aef3f4b8/12911_2023_2239_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/f302d2c7278f/12911_2023_2239_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/5a2083cc48a6/12911_2023_2239_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1d2/10353087/9531f6525c45/12911_2023_2239_Fig8_HTML.jpg

相似文献

1
RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records.RegEMR:一个自然语言处理系统,用于从中文电子病历中自动识别卵巢早衰。
BMC Med Inform Decis Mak. 2023 Jul 18;23(1):126. doi: 10.1186/s12911-023-02239-8.
2
[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.自然语言处理和机器学习可实现从电子病历中自动提取和分类患者的吸烟状况。
Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.
5
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
6
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
7
Artificial Intelligence-Based Traditional Chinese Medicine Assistive Diagnostic System: Validation Study.基于人工智能的中医辅助诊断系统:验证研究。
JMIR Med Inform. 2020 Jun 15;8(6):e17608. doi: 10.2196/17608.
8
Extracting important information from Chinese Operation Notes with natural language processing methods.运用自然语言处理方法从中文手术记录中提取重要信息。
J Biomed Inform. 2014 Apr;48:130-6. doi: 10.1016/j.jbi.2013.12.017. Epub 2014 Jan 31.
9
Facilitating clinical research through automation: Combining optical character recognition with natural language processing.通过自动化促进临床研究:结合光学字符识别和自然语言处理。
Clin Trials. 2022 Oct;19(5):504-511. doi: 10.1177/17407745221093621. Epub 2022 May 24.
10
Automated outcome classification of emergency department computed tomography imaging reports.急诊 CT 影像报告的自动化结果分类。
Acad Emerg Med. 2013 Aug;20(8):848-54. doi: 10.1111/acem.12174.

引用本文的文献

1
Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用:一项范围综述
J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.
2
The Role of Artificial Intelligence in Female Infertility Diagnosis: An Update.人工智能在女性不孕症诊断中的作用:最新进展
J Clin Med. 2025 Apr 30;14(9):3127. doi: 10.3390/jcm14093127.
3
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.

本文引用的文献

1
Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021.从中医文本数据中提取信息:2010年至2021年任务、挑战及方法综述
Evid Based Complement Alternat Med. 2022 May 13;2022:1679589. doi: 10.1155/2022/1679589. eCollection 2022.
2
Med7: A transferable clinical natural language processing model for electronic health records.Med7:一种可转移的电子健康记录临床自然语言处理模型。
Artif Intell Med. 2021 Aug;118:102086. doi: 10.1016/j.artmed.2021.102086. Epub 2021 May 18.
3
Automated detection of substance use information from electronic health records for a pediatric population.
2023年生物医学自然语言处理领域:向大语言模型和生成式人工智能致敬。
Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8.
4
Construction and Application of a Traditional Chinese Medicine Syndrome Differentiation Model for Dysmenorrhea Based on Machine Learning.基于机器学习的痛经中医辨证模型的构建与应用
Comb Chem High Throughput Screen. 2025;28(4):664-674. doi: 10.2174/0113862073293191240212091028.
从电子健康记录中自动检测儿科人群的物质使用信息。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2116-2127. doi: 10.1093/jamia/ocab116.
4
Premature Ovarian Insufficiency: Past, Present, and Future.卵巢早衰:过去、现在与未来
Front Cell Dev Biol. 2021 May 10;9:672890. doi: 10.3389/fcell.2021.672890. eCollection 2021.
5
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT(来自 Transformers 的双向编码器表示)的深度学习方法在提取中文放射学报告证据中的应用:计算机辅助肝癌诊断框架的开发。
J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.
6
Testing and interpreting measures of ovarian reserve: a committee opinion.检测和解读卵巢储备功能的方法:委员会观点。
Fertil Steril. 2020 Dec;114(6):1151-1157. doi: 10.1016/j.fertnstert.2020.09.134.
7
Deep Natural Language Processing to Identify Symptom Documentation in Clinical Notes for Patients With Heart Failure Undergoing Cardiac Resynchronization Therapy.深度自然语言处理在识别心力衰竭行心脏再同步治疗患者临床记录中的症状文档。
J Pain Symptom Manage. 2020 Nov;60(5):948-958.e3. doi: 10.1016/j.jpainsymman.2020.06.010. Epub 2020 Jun 22.
8
FREGEX: A Feature Extraction Method for Biomedical Text Classification using Regular Expressions.FREGEX:一种使用正则表达式进行生物医学文本分类的特征提取方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:6085-6088. doi: 10.1109/EMBC.2019.8857471.
9
Global, regional, and national prevalence and disability-adjusted life-years for infertility in 195 countries and territories, 1990-2017: results from a global burden of disease study, 2017.1990 - 2017年195个国家和地区不孕症的全球、区域和国家患病率及伤残调整生命年:全球疾病负担研究的结果,2017年
Aging (Albany NY). 2019 Dec 2;11(23):10952-10991. doi: 10.18632/aging.102497.
10
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.