Chen Jinying, Yu Hong
Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States.
Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States; Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States.
J Biomed Inform. 2017 Apr;68:121-131. doi: 10.1016/j.jbi.2017.02.016. Epub 2017 Mar 4.
Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care.
The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients.
We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word co-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines.
FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<0.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter.
FIT can automatically identify EHR terms important to patients. It may help develop future interventions to improve quality of care. By using unsupervised learning as well as a robust and flexible framework for information fusion, FIT can be readily applied to other domains and applications.
允许患者通过在线患者门户访问自己的电子健康记录(EHR)笔记有改善以患者为中心的护理的潜力。然而,EHR笔记包含大量医学术语,患者可能难以理解。帮助患者的一种方法是减少信息过载,并帮助他们专注于对他们最重要的医学术语。然后可以开展有针对性的教育,以提高患者对EHR的理解以及护理质量。
这项工作的目的是开发FIT(为患者查找重要术语),这是一种无监督自然语言处理(NLP)系统,可根据医学术语对患者的重要性对EHR笔记中的医学术语进行排名。
我们基于从有偏随机游走算法派生的新无监督集成排名模型构建了FIT,以组合异构信息资源,对每个EHR笔记中的候选术语进行排名。具体而言,FIT整合了四个用于术语重要性的单视图(排名器):患者对医学概念的使用、文档级术语显著性、基于词共现的术语相关性和主题连贯性。它还纳入了由术语的不熟悉程度和语义类型传达的术语重要性的部分信息。我们在90份专家注释的EHR笔记上评估了FIT,并将四个单视图排名器用作基线。此外,我们实施了三种基准无监督集成排名方法作为强基线。
FIT在对EHR笔记中的候选术语进行排名以识别重要术语方面实现了0.885的AUC-ROC。当包括术语识别时,FIT从EHR笔记中识别重要术语的性能为0.813 AUC-ROC。这两个性能分数均显著超过四个单排名器的相应分数(P<0.001)。在大多数指标上,FIT也优于三个集成排名器。其性能对其参数相对不敏感。
FIT可以自动识别对患者重要的EHR术语。它可能有助于开发未来的干预措施以提高护理质量。通过使用无监督学习以及强大且灵活的信息融合框架,FIT可以很容易地应用于其他领域和应用。