• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于自然语言处理的人工心理健康记录的生成与评估

Generation and evaluation of artificial mental health records for Natural Language Processing.

作者信息

Ive Julia, Viani Natalia, Kam Joyce, Yin Lucia, Verma Somain, Puntis Stephen, Cardinal Rudolf N, Roberts Angus, Stewart Robert, Velupillai Sumithra

机构信息

1Department of Computing, Imperial College London, London, SW7 2AZ UK.

2IoPPN, King's College London, SE5 8AF London, UK.

出版信息

NPJ Digit Med. 2020 May 14;3:69. doi: 10.1038/s41746-020-0267-x. eCollection 2020.

DOI:10.1038/s41746-020-0267-x
PMID:32435697
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7224173/
Abstract

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

摘要

临床领域中自然语言处理(NLP)方法发展的一个严重障碍是文本数据的可获取性。心理健康领域尤其具有挑战性,部分原因是临床文档严重依赖难以完全去识别的自由文本。这个问题可以通过使用人工医疗数据来解决。在这项工作中,我们提出了一种生成人工临床文档的方法。我们将此方法应用于一家大型心理健康护理机构的出院小结以及重症监护病房的出院小结。我们进行了广泛的内在评估,其中我们:(1)应用多种文本保留度量;(2)测量模型对训练数据的记忆程度;(3)基于一项人工评估任务估计生成文本的临床有效性。此外,我们通过研究在下游NLP文本分类任务中使用人工文本的影响来进行外在评估。我们发现,将这些人工数据用作训练数据可得出与原始结果相当的分类结果。此外,仅使用原始数据中的少量信息来调整人工数据的生成是成功的,这有望降低这些人工数据保留原始数据中稀有信息的风险。对于我们能够生成可发布给更广泛研究群体的人工临床数据,并加速使用医疗保健数据的计算方法发展这一长期目标而言,这是一项重要发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/4dc42937b914/41746_2020_267_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/3dd03bcf320a/41746_2020_267_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/d9e61083ddbe/41746_2020_267_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/1eecd42f3826/41746_2020_267_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/4dc42937b914/41746_2020_267_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/3dd03bcf320a/41746_2020_267_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/d9e61083ddbe/41746_2020_267_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/1eecd42f3826/41746_2020_267_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb28/7224173/4dc42937b914/41746_2020_267_Fig4_HTML.jpg

相似文献

1
Generation and evaluation of artificial mental health records for Natural Language Processing.用于自然语言处理的人工心理健康记录的生成与评估
NPJ Digit Med. 2020 May 14;3:69. doi: 10.1038/s41746-020-0267-x. eCollection 2020.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
4
Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.COVID-19住院患者处置情况分类:使用自然语言处理技术阅读出院小结
JMIR Med Inform. 2021 Feb 10;9(2):e25457. doi: 10.2196/25457.
5
Linking Free Text Documentation of Functioning and Disability to the ICF With Natural Language Processing.通过自然语言处理将功能与残疾的自由文本记录与《国际功能、残疾和健康分类》相联系。
Front Rehabil Sci. 2021 Nov;2. doi: 10.3389/fresc.2021.742702. Epub 2021 Nov 5.
6
From admission to discharge: a systematic review of clinical natural language processing along the patient journey.从入院到出院:患者就诊流程中临床自然语言处理的系统评价。
BMC Med Inform Decis Mak. 2024 Aug 29;24(1):238. doi: 10.1186/s12911-024-02641-w.
7
Identifying Patient-Reported Outcome Measure Documentation in Veterans Health Administration Chiropractic Clinic Notes: Natural Language Processing Analysis.识别退伍军人健康管理局脊椎按摩诊所记录中的患者报告结局测量文档:自然语言处理分析
JMIR Med Inform. 2025 Apr 2;13:e66466. doi: 10.2196/66466.
8
The Growing Impact of Natural Language Processing in Healthcare and Public Health.自然语言处理在医疗保健和公共卫生领域的影响日益扩大。
Inquiry. 2024 Jan-Dec;61:469580241290095. doi: 10.1177/00469580241290095.
9
Evaluation of an automated knowledge-based textual summarization system for longitudinal clinical data, in the intensive care domain.评估一个自动化的基于知识的文本摘要系统在重症监护领域的纵向临床数据中的应用。
Artif Intell Med. 2017 Oct;82:20-33. doi: 10.1016/j.artmed.2017.09.001. Epub 2017 Sep 27.
10
Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.Ascle-A 是一个用于医疗文本生成的 Python 自然语言处理工具包:开发和评估研究。
J Med Internet Res. 2024 Oct 3;26:e60601. doi: 10.2196/60601.

引用本文的文献

1
Multi-Modal Fusion of Routine Care Electronic Health Records (EHR): A Scoping Review.常规护理电子健康记录(EHR)的多模态融合:一项范围综述
Information (Basel). 2025 Jan;16(1). doi: 10.3390/info16010054. Epub 2025 Jan 15.
2
Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data.临床文档语料库——真实语料库、翻译语料库和合成替代语料库,以及各类领域替代语料库:语料库设计多样性调查,重点关注德语文本数据
JAMIA Open. 2025 May 14;8(3):ooaf024. doi: 10.1093/jamiaopen/ooaf024. eCollection 2025 Jun.
3
Evaluation and Bias Analysis of Large Language Models in Generating Synthetic Electronic Health Records: Comparative Study.

本文引用的文献

1
Natural language generation for electronic health records.电子健康记录的自然语言生成
NPJ Digit Med. 2018 Nov 19;1:63. doi: 10.1038/s41746-018-0070-0. Print 2018.
2
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.比较基于深度学习和概念提取的方法用于从临床叙述中进行患者表型分析。
PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.
3
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
大语言模型生成合成电子健康记录的评估与偏差分析:比较研究
J Med Internet Res. 2025 May 12;27:e65317. doi: 10.2196/65317.
4
Safety of human serum albumin infusion in heart failure patients with hypoproteinemia: a propensity score-matched analysis.低蛋白血症心力衰竭患者输注人血白蛋白的安全性:倾向评分匹配分析
Clinics (Sao Paulo). 2025 Apr 24;80:100659. doi: 10.1016/j.clinsp.2025.100659. eCollection 2025.
5
Current status, challenges, and prospects of artificial intelligence applications in wound repair theranostics.人工智能在伤口修复诊疗中的应用现状、挑战与前景
Theranostics. 2025 Jan 2;15(5):1662-1688. doi: 10.7150/thno.105109. eCollection 2025.
6
Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: Benchmark Study.探讨大型语言模型在总结心理健康咨询会话中的功效:基准研究。
JMIR Ment Health. 2024 Jul 23;11:e57306. doi: 10.2196/57306.
7
Generation of a Realistic Synthetic Laryngeal Cancer Cohort for AI Applications.用于人工智能应用的逼真合成喉癌队列的生成。
Cancers (Basel). 2024 Feb 1;16(3):639. doi: 10.3390/cancers16030639.
8
Can language models be used for real-world urban-delivery route optimization?语言模型能否用于实际的城市配送路线优化?
Innovation (Camb). 2023 Sep 29;4(6):100520. doi: 10.1016/j.xinn.2023.100520. eCollection 2023 Nov 13.
9
Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.利用合成数据在医疗保健领域的力量:创新、应用与隐私。
NPJ Digit Med. 2023 Oct 9;6(1):186. doi: 10.1038/s41746-023-00927-3.
10
Connecting the use of innovative treatments and glucocorticoids with the multidisciplinary evaluation through rule-based natural-language processing: a real-world study on patients with rheumatoid arthritis, psoriatic arthritis, and psoriasis.通过基于规则的自然语言处理将创新疗法和糖皮质激素的使用与多学科评估联系起来:一项针对类风湿性关节炎、银屑病关节炎和银屑病患者的真实世界研究。
Front Med (Lausanne). 2023 Jun 14;10:1179240. doi: 10.3389/fmed.2023.1179240. eCollection 2023.
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
4
Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource.南伦敦和莫兹利国民保健服务基金会信托生物医学研究中心(SLaM BRC)病例登记册的队列概况:源自电子心理健康记录的数据资源的现状及近期改进
BMJ Open. 2016 Mar 1;6(3):e008721. doi: 10.1136/bmjopen-2015-008721.
5
Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records.开发和评估一种从精神健康电子记录来源的病例登记中去除识别信息的程序。
BMC Med Inform Decis Mak. 2013 Jul 11;13:71. doi: 10.1186/1472-6947-13-71.
6
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.克服临床文本自然语言处理的障碍:共享任务的作用及对其他创造性解决方案的需求。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465.
7
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.