用于药物代谢酶和转运体基因分型数据分析的并行软件管道

A Parallel Software Pipeline for DMET Microarray Genotyping Data Analysis.

作者信息

Agapito Giuseppe, Guzzi Pietro Hiram, Cannataro Mario

机构信息

Data Analytics Research Center, Department of Medical and Surgical Sciences, University "Magna Græcia" of Catanzaro, Viale Europa, 88100 Catanzaro, Italy.

出版信息

High Throughput. 2018 Jun 14;7(2):17. doi: 10.3390/ht7020017.

DOI:10.3390/ht7020017

PMID:29904017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6023446/

Abstract

Personalized medicine is an aspect of the P4 medicine (predictive, preventive, personalized and participatory) based precisely on the customization of all medical characters of each subject. In personalized medicine, the development of medical treatments and drugs is tailored to the individual characteristics and needs of each subject, according to the study of diseases at different scales from genotype to phenotype scale. To make concrete the goal of personalized medicine, it is necessary to employ high-throughput methodologies such as Next Generation Sequencing (NGS), Genome-Wide Association Studies (GWAS), Mass Spectrometry or Microarrays, that are able to investigate a single disease from a broader perspective. A side effect of high-throughput methodologies is the massive amount of data produced for each single experiment, that poses several challenges (e.g., high execution time and required memory) to bioinformatic software. Thus a main requirement of modern bioinformatic softwares, is the use of good software engineering methods and efficient programming techniques, able to face those challenges, that include the use of parallel programming and efficient and compact data structures. This paper presents the design and the experimentation of a comprehensive software pipeline, named microPipe, for the preprocessing, annotation and analysis of microarray-based Single Nucleotide Polymorphism (SNP) genotyping data. A use case in pharmacogenomics is presented. The main advantages of using microPipe are: the reduction of errors that may happen when trying to make data compatible among different tools; the possibility to analyze in parallel huge datasets; the easy annotation and integration of data. microPipe is available under Creative Commons license, and is freely downloadable for academic and not-for-profit institutions.

摘要

个性化医疗是基于对每个个体所有医学特征进行定制的“4P医学”（预测性、预防性、个性化和参与性）的一个方面。在个性化医疗中，根据从基因型到表型不同尺度的疾病研究，医疗治疗和药物的开发是针对每个个体的特征和需求量身定制的。为了实现个性化医疗的目标，有必要采用高通量方法，如下一代测序（NGS）、全基因组关联研究（GWAS）、质谱分析或微阵列分析，这些方法能够从更广泛的角度研究单一疾病。高通量方法的一个副作用是每个单一实验产生大量数据，这给生物信息软件带来了诸多挑战（例如，执行时间长和所需内存大）。因此，现代生物信息软件的一个主要要求是使用良好的软件工程方法和高效的编程技术，以应对这些挑战，其中包括使用并行编程以及高效且紧凑的数据结构。本文介绍了一个名为microPipe的综合软件管道的设计与实验，用于基于微阵列的单核苷酸多态性（SNP）基因分型数据的预处理、注释和分析。还展示了药物基因组学中的一个用例。使用microPipe的主要优点包括：减少在尝试使不同工具间数据兼容时可能出现的错误；能够并行分析海量数据集；数据注释和整合简便。microPipe遵循知识共享许可协议，可供学术机构和非营利性机构免费下载。