Suppr超能文献

使用机器学习提取的真实世界数据支持证据生成的考量:一个以研究为中心的评估框架。

Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework.

作者信息

Estevez Melissa, Benedum Corey M, Jiang Chengsheng, Cohen Aaron B, Phadke Sharang, Sarkar Somnath, Bozkurt Selen

机构信息

Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA.

Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA.

出版信息

Cancers (Basel). 2022 Jun 22;14(13):3063. doi: 10.3390/cancers14133063.

Abstract

A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.

摘要

大量的真实世界数据,如病理报告和临床记录,在电子健康记录(EHR)中被捕获为非结构化文本。然而,通过人工提取这些信息既困难又昂贵,尤其是在需要扩展到大型数据集时。幸运的是,自然语言处理(NLP)和机器学习(ML)技术为各种信息提取任务提供了有前景的解决方案,例如识别患有特定诊断、具有共同特征或显示疾病进展的一组患者。然而,将这些机器学习提取的数据用于研究,在评估有效性和对不同感兴趣队列的普遍性方面仍然带来独特的挑战。为了能够有效且准确地使用机器学习提取的真实世界数据(RWD)来支持研究和生成真实世界证据,我们为模型开发者、机器学习提取的数据使用者和其他RWD利益相关者提出了一个以研究为中心的评估框架。该框架涵盖了评估使用机器学习方法产生的RWD的基本要素,以最大限度地将电子健康记录数据用于研究目的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb70/9264846/6ac4908dc5d0/cancers-14-03063-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验