He Chen, Micallef Luana, Tanoli Zia-Ur-Rehman, Kaski Samuel, Aittokallio Tero, Jacucci Giulio
Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2b, Helsinki, 00560, Finland.
Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150, Finland.
BMC Bioinformatics. 2017 Sep 13;18(Suppl 10):393. doi: 10.1186/s12859-017-1785-7.
Dispersed biomedical databases limit user exploration to generate structured knowledge. Linked Data unifies data structures and makes the dispersed data easy to search across resources, but it lacks supporting human cognition to achieve insights. In addition, potential errors in the data are difficult to detect in their free formats. Devising a visualization that synthesizes multiple sources in such a way that links between data sources are transparent, and uncertainties, such as data conflicts, are salient is challenging.
To investigate the requirements and challenges of uncertainty-aware visualizations of linked data, we developed MediSyn, a system that synthesizes medical datasets to support drug treatment selection. It uses a matrix-based layout to visually link drugs, targets (e.g., mutations), and tumor types. Data uncertainties are salient in MediSyn; for example, (i) missing data are exposed in the matrix view of drug-target relations; (ii) inconsistencies between datasets are shown via overlaid layers; and (iii) data credibility is conveyed through links to data provenance.
Through the synthesis of two manually curated datasets, cancer treatment biomarkers and drug-target bioactivities, a use case shows how MediSyn effectively supports the discovery of drug-repurposing opportunities. A study with six domain experts indicated that MediSyn benefited the drug selection and data inconsistency discovery. Though linked publication sources supported user exploration for further information, the causes of inconsistencies were not easy to find. Additionally, MediSyn could embrace more patient data to increase its informativeness. We derive design implications from the findings.
分散的生物医学数据库限制了用户为生成结构化知识而进行的探索。关联数据统一了数据结构,使分散的数据易于跨资源进行搜索,但它缺乏支持人类认知以实现深入见解的能力。此外,数据中的潜在错误在其自由格式中难以检测。设计一种可视化方法,以一种使数据源之间的链接透明且数据冲突等不确定性显著的方式来综合多个来源,这具有挑战性。
为了研究关联数据的不确定性感知可视化的要求和挑战,我们开发了MediSyn系统,该系统综合医学数据集以支持药物治疗选择。它使用基于矩阵的布局在视觉上链接药物、靶点(例如突变)和肿瘤类型。数据不确定性在MediSyn中很突出;例如,(i)缺失数据在药物 - 靶点关系的矩阵视图中暴露;(ii)数据集之间的不一致通过叠加层显示;(iii)数据可信度通过与数据来源的链接来传达。
通过综合两个人工整理的数据集,即癌症治疗生物标志物和药物 - 靶点生物活性,一个用例展示了MediSyn如何有效地支持药物再利用机会的发现。一项对六位领域专家的研究表明,MediSyn有助于药物选择和数据不一致性的发现。尽管关联的出版物来源支持用户探索以获取更多信息,但不一致的原因并不容易找到。此外,MediSyn可以纳入更多患者数据以增加其信息量。我们从这些发现中得出设计启示。