Dellinger Andrew E, Nixon Andrew B, Pang Herbert
Department of Mathematics and Statistics, Elon University, Elon, NC, USA. ; Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.
Department of Medicine, Division of Medical Oncology, Duke University School of Medicine, Durham, NC, USA.
Cancer Inform. 2014 Jul 28;13(Suppl 4):1-9. doi: 10.4137/CIN.S13634. eCollection 2014.
Recent method development has included multi-dimensional genomic data algorithms because such methods have more accurately predicted clinical phenotypes related to disease. This study is the first to conduct an integrative genomic pathway-based analysis with a graph-based learning algorithm. The methodology of this analysis, graph-based semi-supervised learning, detects pathways that improve prediction of a dichotomous variable, which in this study is cancer stage. This analysis integrates genome-level gene expression, methylation, and single nucleotide polymorphism (SNP) data in serous cystadenocarcinoma (OV) and colon adenocarcinoma (COAD). The top 10 ranked predictive pathways in COAD and OV were biologically relevant to their respective cancer stages and significantly enhanced prediction accuracy and area under the ROC curve (AUC) when compared to single data-type analyses. This method is an effective way to simultaneously predict binary clinical phenotypes and discover their biological mechanisms.
最近的方法开发包括多维基因组数据算法,因为此类方法能更准确地预测与疾病相关的临床表型。本研究首次使用基于图的学习算法进行基于基因组通路的综合分析。这种分析方法,即基于图的半监督学习,可检测出能改善二分变量预测的通路,在本研究中该二分变量为癌症分期。该分析整合了浆液性囊腺癌(OV)和结肠腺癌(COAD)中的基因组水平基因表达、甲基化和单核苷酸多态性(SNP)数据。与单一数据类型分析相比,COAD和OV中排名前十的预测通路与其各自的癌症分期具有生物学相关性,并显著提高了预测准确性和ROC曲线下面积(AUC)。该方法是同时预测二元临床表型并发现其生物学机制的有效途径。