Denis Marie, Tadesse Mahlet G
UMR AGAP, CIRAD, Montpellier, France, Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA and.
Department of Mathematics and Statistics, Georgetown University, Washington, DC, USA.
Bioinformatics. 2016 Mar 1;32(5):738-46. doi: 10.1093/bioinformatics/btv653. Epub 2015 Nov 5.
Advances in high-throughput technologies have led to the acquisition of various types of -omic data on the same biological samples. Each data type gives independent and complementary information that can explain the biological mechanisms of interest. While several studies performing independent analyses of each dataset have led to significant results, a better understanding of complex biological mechanisms requires an integrative analysis of different sources of data.
Flexible modeling approaches, based on penalized likelihood methods and expectation-maximization (EM) algorithms, are studied and tested under various biological relationship scenarios between the different molecular features and their effects on a clinical outcome. The models are applied to genomic datasets from two cancer types in the Cancer Genome Atlas project: glioblastoma multiforme and ovarian serous cystadenocarcinoma. The integrative models lead to improved model fit and predictive performance. They also provide a better understanding of the biological mechanisms underlying patients' survival.
Source code implementing the integrative models is freely available at https://github.com/mgt000/IntegrativeAnalysis along with example datasets and sample R script applying the models to these data. The TCGA datasets used for analysis are publicly available at https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp
marie.denis@cirad.fr or mgt26@georgetown.edu
Supplementary data are available at Bioinformatics online.
高通量技术的进步使得能够在相同生物样本上获取各种类型的组学数据。每种数据类型都提供独立且互补的信息,这些信息能够解释所关注的生物学机制。虽然多项对每个数据集进行独立分析的研究都取得了显著成果,但要更好地理解复杂的生物学机制,需要对不同数据源进行综合分析。
基于惩罚似然方法和期望最大化(EM)算法的灵活建模方法,在不同分子特征之间的各种生物学关系场景及其对临床结局的影响下进行了研究和测试。这些模型应用于癌症基因组图谱项目中两种癌症类型的基因组数据集:多形性胶质母细胞瘤和卵巢浆液性囊腺癌。综合模型提高了模型拟合度和预测性能。它们还能更好地理解患者生存背后的生物学机制。
marie.denis@cirad.fr或mgt26@georgetown.edu
补充数据可在《生物信息学》在线获取。