Data Science Platform, Imagine Institute, Université Paris Cité, Inserm UMR 1163, Paris, France.
Inserm, Centre de Recherche des Cordeliers, Sorbonne Université, Université Paris Cité, Paris, France.
Stud Health Technol Inform. 2024 Aug 22;316:1785-1789. doi: 10.3233/SHTI240777.
Rare diseases pose significant challenges due to their heterogeneity and lack of knowledge. This study develops a comprehensive pipeline interoperable with a document-oriented clinical data warehouse, integrating cohort characterization, patient clustering and interpretation. Leveraging NLP, semantic similarity, machine learning and visualization, the pipeline enables the identification of prevalent phenotype patterns and patient stratification. To enhance interpretability, discriminant phenotypes characterizing each cluster are provided. Users can visually test hypotheses by marking patients exhibiting specific keywords in the EHR like genes, drugs and procedures. Implemented through a web interface, the pipeline enables clinicians to navigate through different modules, discover intricate patterns and generate interpretable insights that may advance rare diseases understanding, guide decision-making, and ultimately improve patient outcomes.
罕见病由于其异质性和知识缺乏而带来重大挑战。本研究开发了一个与面向文档的临床数据仓库互操作的综合管道,整合了队列特征描述、患者聚类和解释。利用 NLP、语义相似性、机器学习和可视化,该管道能够识别普遍的表型模式和患者分层。为了增强可解释性,提供了每个聚类特征的判别表型。用户可以通过在 EHR 中标记具有特定关键字(如基因、药物和程序)的患者,直观地测试假设。该管道通过 Web 界面实现,使临床医生能够在不同模块中导航,发现复杂的模式,并生成可解释的见解,从而可能推进对罕见病的理解、指导决策,并最终改善患者的预后。