Miotto Riccardo, Weng Chunhua
Department of Biomedical Informatics.
Department of Biomedical Informatics The Irving Institute for Clinical and Translational Research, Columbia University, New York, NY 10032, USA
J Am Med Inform Assoc. 2015 Apr;22(e1):e141-50. doi: 10.1093/jamia/ocu050. Epub 2015 Mar 13.
To develop a cost-effective, case-based reasoning framework for clinical research eligibility screening by only reusing the electronic health records (EHRs) of minimal enrolled participants to represent the target patient for each trial under consideration.
The EHR data--specifically diagnosis, medications, laboratory results, and clinical notes--of known clinical trial participants were aggregated to profile the "target patient" for a trial, which was used to discover new eligible patients for that trial. The EHR data of unseen patients were matched to this "target patient" to determine their relevance to the trial; the higher the relevance, the more likely the patient was eligible. Relevance scores were a weighted linear combination of cosine similarities computed over individual EHR data types. For evaluation, we identified 262 participants of 13 diversified clinical trials conducted at Columbia University as our gold standard. We ran a 2-fold cross validation with half of the participants used for training and the other half used for testing along with other 30 000 patients selected at random from our clinical database. We performed binary classification and ranking experiments.
The overall area under the ROC curve for classification was 0.95, enabling the highlight of eligible patients with good precision. Ranking showed satisfactory results especially at the top of the recommended list, with each trial having at least one eligible patient in the top five positions.
This relevance-based method can potentially be used to identify eligible patients for clinical trials by processing patient EHR data alone without parsing free-text eligibility criteria, and shows promise of efficient "case-based reasoning" modeled only on minimal trial participants.
开发一种具有成本效益的基于案例推理框架,用于临床研究资格筛选,即仅通过重用最少入组参与者的电子健康记录(EHR)来代表所考虑的每个试验的目标患者。
汇总已知临床试验参与者的EHR数据——具体包括诊断、用药、实验室检查结果和临床记录——以勾勒出某个试验的“目标患者”,并用于发现该试验新的合格患者。将未知患者的EHR数据与这个“目标患者”进行匹配,以确定他们与该试验的相关性;相关性越高,患者符合资格的可能性就越大。相关性得分是对各个EHR数据类型计算的余弦相似度的加权线性组合。为了进行评估,我们将哥伦比亚大学进行的13项多样化临床试验的262名参与者确定为金标准。我们进行了2折交叉验证,其中一半参与者用于训练,另一半用于测试,同时还从我们的临床数据库中随机选择了其他30000名患者。我们进行了二元分类和排序实验。
分类的ROC曲线下总面积为0.95,能够以良好的精度突出显示合格患者。排序显示出令人满意的结果,尤其是在推荐列表的顶部,每个试验在排名前五的位置中至少有一名合格患者。
这种基于相关性的方法有可能仅通过处理患者的EHR数据来识别临床试验的合格患者,而无需解析自由文本形式的资格标准,并且显示出仅基于最少试验参与者进行有效“基于案例推理”的前景。