Dunn Tim, Cosgun Erdal
Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA.
Biomedical Platforms and Genomics, Microsoft Research, Redmond, WA 98052, USA.
Bioinform Adv. 2023 Jan 20;3(1):vbac095. doi: 10.1093/bioadv/vbac095. eCollection 2023.
As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions.
In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR data, from initial data to tertiary analysis. The electronic health records are stored in FHIR (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study.
https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics.
Supplementary data are available at online.
随着基因组测序成本降低且准确性提高,将这些数据与电子健康信息合并以辅助临床决策变得越来越可行。
在这项工作中,我们展示了一个完整的流程,用于处理PacBio测序数据和临床FHIR数据,从初始数据到三级分析。电子健康记录以FHIR(快速医疗保健互操作性资源)格式存储,这是医疗保健数据交换的当前领先标准。对于基因组数据,我们使用Azure上的Cromwell对长读长PacBio HiFi数据进行变异检测。两种数据格式都在单个可扩展流程中进行解析、处理和合并,该流程使用基于云的Jupyter笔记本安全地执行三级分析。我们包括三个示例应用:将患者信息导出到数据库、对患者进行聚类以及进行简单的药物基因组学研究。
https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics。
补充数据可在网上获取。