Edmond J Safra Program in Parkinson's Disease, University Health Network, University of Toronto, Toronto, Canada.
Grey Matter Technologies, a Wholly Owned Subsidiary of Modality.ai, San Francisco, CA, USA.
J Parkinsons Dis. 2023;13(5):757-767. doi: 10.3233/JPD-225083.
Free-text, verbatim replies in the words of people with Parkinson's disease (PD) have the potential to provide unvarnished information about their feelings and experiences. Challenges of processing such data on a large scale are a barrier to analyzing verbatim data collection in large cohorts.
To develop a method for curating responses from the Parkinson's Disease Patient Report of Problems (PD-PROP), open-ended questions that asks people with PD to report their most bothersome problems and associated functional consequences.
Human curation, natural language processing, and machine learning were used to develop an algorithm to convert verbatim responses to classified symptoms. Nine curators including clinicians, people with PD, and a non-clinician PD expert classified a sample of responses as reporting each symptom or not. Responses to the PD-PROP were collected within the Fox Insight cohort study.
Approximately 3,500 PD-PROP responses were curated by a human team. Subsequently, approximately 1,500 responses were used in the validation phase; median age of respondents was 67 years, 55% were men and median years since PD diagnosis was 3 years. 168,260 verbatim responses were classified by machine. Accuracy of machine classification was 95% on a held-out test set. 65 symptoms were grouped into 14 domains. The most frequently reported symptoms at first report were tremor (by 46% of respondents), gait and balance problems (>39%), and pain/discomfort (33%).
A human-in-the-loop method of curation provides both accuracy and efficiency, permitting a clinically useful analysis of large datasets of verbatim reports about the problems that bother PD patients.
帕金森病(PD)患者的自由文本、逐字回复有可能提供他们感受和体验的真实信息。在大规模处理此类数据方面存在的挑战是分析大型队列中逐字数据收集的障碍。
开发一种方法来管理帕金森病患者报告问题(PD-PROP)开放式问题的回复,该问题要求 PD 患者报告他们最困扰的问题及其相关的功能后果。
采用人工编辑、自然语言处理和机器学习来开发一种算法,将逐字回复转换为分类症状。9 位编辑者,包括临床医生、PD 患者和非临床 PD 专家,将一部分回复分类为报告或不报告每个症状。PD-PROP 的回复是在 Fox Insight 队列研究中收集的。
大约 3500 个 PD-PROP 回复由人工团队编辑。随后,大约 1500 个回复用于验证阶段;受访者的中位年龄为 67 岁,55%为男性,PD 诊断后的中位年限为 3 年。168260 条逐字回复由机器分类。机器分类的准确率在一个保留测试集上为 95%。65 种症状分为 14 个领域。首次报告时最常报告的症状是震颤(占 46%的受访者)、步态和平衡问题(超过 39%)和疼痛/不适(33%)。
一种人机交互的编辑方法既提供了准确性又提高了效率,从而可以对大量关于困扰 PD 患者的问题的逐字报告进行有临床意义的分析。