Sezgin Emre, Hussain Syed-Amad, Rust Steve, Huang Yungui
The Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, OH, United States.
The Ohio State University College of Medicine, Columbus, OH, United States.
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
Patient-generated health data (PGHD) captured via smart devices or digital health technologies can reflect an individual health journey. PGHD enables tracking and monitoring of personal health conditions, symptoms, and medications out of the clinic, which is crucial for self-care and shared clinical decisions. In addition to self-reported measures and structured PGHD (eg, self-screening, sensor-based biometric data), free-text and unstructured PGHD (eg, patient care note, medical diary) can provide a broader view of a patient's journey and health condition. Natural language processing (NLP) is used to process and analyze unstructured data to create meaningful summaries and insights, showing promise to improve the utilization of PGHD.
Our aim is to understand and demonstrate the feasibility of an NLP pipeline to extract medication and symptom information from real-world patient and caregiver data.
We report a secondary data analysis, using a data set collected from 24 parents of children with special health care needs (CSHCN) who were recruited via a nonrandom sampling approach. Participants used a voice-interactive app for 2 weeks, generating free-text patient notes (audio transcription or text entry). We built an NLP pipeline using a zero-shot approach (adaptive to low-resource settings). We used named entity recognition (NER) and medical ontologies (RXNorm and SNOMED CT [Systematized Nomenclature of Medicine Clinical Terms]) to identify medication and symptoms. Sentence-level dependency parse trees and part-of-speech tags were used to extract additional entity information using the syntactic properties of a note. We assessed the data; evaluated the pipeline with the patient notes; and reported the precision, recall, and F scores.
In total, 87 patient notes are included (audio transcriptions n=78 and text entries n=9) from 24 parents who have at least one CSHCN. The participants were between the ages of 26 and 59 years. The majority were White (n=22, 92%), had more than one child (n=16, 67%), lived in Ohio (n=22, 92%), had mid- or upper-mid household income (n=15, 62.5%), and had higher level education (n=24, 58%). Out of 87 notes, 30 were drug and medication related, and 46 were symptom related. We captured medication instances (medication, unit, quantity, and date) and symptoms satisfactorily (precision >0.65, recall >0.77, F>0.72). These results indicate the potential when using NER and dependency parsing through an NLP pipeline on information extraction from unstructured PGHD.
The proposed NLP pipeline was found to be feasible for use with real-world unstructured PGHD to accomplish medication and symptom extraction. Unstructured PGHD can be leveraged to inform clinical decision-making, remote monitoring, and self-care including medical adherence and chronic disease management. With customizable information extraction methods using NER and medical ontologies, NLP models can feasibly extract a broad range of clinical information from unstructured PGHD in low-resource settings (eg, a limited number of patient notes or training data).
通过智能设备或数字健康技术收集的患者生成的健康数据(PGHD)可以反映个人的健康历程。PGHD能够在诊所之外跟踪和监测个人健康状况、症状及用药情况,这对自我保健和共同临床决策至关重要。除了自我报告的测量数据和结构化的PGHD(如自我筛查、基于传感器的生物特征数据)外,自由文本和非结构化的PGHD(如患者护理记录、医疗日记)可以更全面地展现患者的历程和健康状况。自然语言处理(NLP)用于处理和分析非结构化数据,以创建有意义的总结和见解,有望提高PGHD的利用率。
我们的目标是了解并证明一个NLP流程从真实世界的患者和护理人员数据中提取用药和症状信息的可行性。
我们报告了一项二次数据分析,使用的数据集来自通过非随机抽样方法招募的24名有特殊医疗需求儿童(CSHCN)的家长。参与者使用语音交互应用程序两周,生成自由文本患者记录(音频转录或文本输入)。我们使用零样本方法(适用于低资源设置)构建了一个NLP流程。我们使用命名实体识别(NER)和医学本体(RXNorm和SNOMED CT [医学临床术语系统命名法])来识别用药和症状。句子级依存句法分析树和词性标签用于利用记录的句法属性提取额外的实体信息。我们评估了数据;使用患者记录对流程进行了评估;并报告了精确率、召回率和F值。
总共纳入了来自24名至少有一名CSHCN的家长的87份患者记录(音频转录78份,文本输入9份)。参与者年龄在26至59岁之间。大多数是白人(22人,92%),有不止一个孩子(16人,67%),居住在俄亥俄州(22人,92%),家庭收入处于中等或中上等水平(15人,62.5%),且受教育程度较高(24人,58%)。在87份记录中,30份与药物和用药相关,46份与症状相关。我们令人满意地捕获了用药实例(药物、单位、数量和日期)和症状(精确率>0.65,召回率>0.77,F>0.72)。这些结果表明,通过NLP流程使用NER和依存句法分析从非结构化PGHD中提取信息具有潜力。
所提出的NLP流程被发现可用于真实世界的非结构化PGHD以完成用药和症状提取。非结构化PGHD可用于为临床决策、远程监测和自我保健(包括药物依从性和慢性病管理)提供信息。通过使用NER和医学本体的可定制信息提取方法,NLP模型可以在低资源设置(如患者记录或训练数据数量有限)下从非结构化PGHD中可行地提取广泛的临床信息。