Dhayne Houssein, Kilany Rima, Haque Rafiqul, Taher Yehia
Saint Joseph University, Mar Roukos, Beirut, Lebanon.
Intelligencia, 66 Avenue des Champs Elysees, Paris, France.
Comput Ind Eng. 2021 Jun;156:107236. doi: 10.1016/j.cie.2021.107236. Epub 2021 Mar 15.
The human suffering from diseases caused by life-threatening viruses such as SARS, Ebola, and COVID-19 motivated many of us to study and discover the best means to harness the potential of data integration to assist clinical researchers to curb these viruses. Integrating patients data with clinical trials data is enormously promising as it provides a comprehensive knowledge base that accelerates the clinical research response-ability to tackle emerging infectious disease outbreaks. This work introduces EMR2vec, a platform that customises advanced NLP, machine learning and semantic web techniques to link potential patients to suitable clinical trials. Linking these two different but complementary datasets allows clinicians and researchers to compare patients to clinical research opportunities or to automatically select patients for personalized clinical care. The platform derives a 'bag of medical terms' (BoMT) from eligibility criteria by normalizing extracted entities through SNOMED-CT ontology. With the usage of BoMT, an ontological reasoning method is proposed to represent EMR and clinical trials in a vector space model. The platform presents a matching process that reduces vector dimensionality using a neural network, then applies orthogonality projection to measure the similarity between vectors. Finally, the proposed EMR2vec platform is evaluated with an extendable prototype based on Big data tools.
感染严重急性呼吸综合征(SARS)、埃博拉病毒和新型冠状病毒肺炎(COVID-19)等危及生命的病毒所导致的人类苦难,促使我们许多人去研究并发现利用数据整合潜力的最佳方法,以协助临床研究人员控制这些病毒。将患者数据与临床试验数据整合极具前景,因为它提供了一个全面的知识库,可加快临床研究应对新出现的传染病爆发的能力。这项工作引入了EMR2vec平台,该平台定制了先进的自然语言处理、机器学习和语义网技术,以将潜在患者与合适的临床试验相匹配。将这两个不同但互补的数据集相链接,可使临床医生和研究人员将患者与临床研究机会进行比较,或自动为个性化临床护理选择患者。该平台通过SNOMED-CT本体对提取的实体进行标准化,从入选标准中得出“医学术语包”(BoMT)。利用BoMT,提出了一种本体推理方法,以在向量空间模型中表示电子病历(EMR)和临床试验。该平台展示了一个匹配过程,该过程使用神经网络降低向量维度,然后应用正交投影来测量向量之间的相似度。最后,基于大数据工具,使用一个可扩展的原型对所提出的EMR2vec平台进行了评估。