在基于结直肠癌样本和强力霉素处理的细胞系训练的朴素Wnt贝叶斯网络中，缺失观测值时参数学习的可重复性。

Reproducibility of parameter learning with missing observations in naive Wnt Bayesian network trained on colorectal cancer samples and doxycycline-treated cell lines.

作者信息

Sinha Shriprakash

机构信息

Netherlands Bioinformatics Centre, 6500 HB, Nijmegen, The Netherlands.

出版信息

Mol Biosyst. 2015 Jul;11(7):1802-19. doi: 10.1039/c5mb00117j.

DOI:10.1039/c5mb00117j

PMID:25961654

Abstract

In this manuscript the reproducibility of parameter learning with missing observations in a naive Bayesian network and its effect on the prediction results for Wnt signaling activation in colorectal cancer is tested. The training of the network is carried out separately on doxycycline-treated LS174T cell lines (GSE18560) as well as normal and adenoma samples (GSE8671). A computational framework to test the reproducibility of the parameters is designed in order check the veracity of the prediction results. Detailed experimental analysis suggests that the prediction results are accurate and reproducible with negligible deviations. Anomalies in estimated parameters are accounted for due to the representation issues of the Bayesian network model. High prediction accuracies are reported for normal (N) and colon-related adenomas (AD), colorectal cancer (CRC), carcinomas (C), adenocarcinomas (ADC) and replication error colorectal cancer (RER CRC) test samples. Test samples from inflammatory bowel diseases (IBD) do not fare well in the prediction test. Also, an interesting case regarding hypothesis testing came up while proving the statistical significance of the different design setups of the Bayesian network model. It was found that hypothesis testing may not be the correct way to check the significance between design setups, especially when the structure of the model is the same, given that the model is trained on a single piece of test data. The significance test does have value when the datasets are independent. Finally, in comparison to the biologically inspired models, the naive Bayesian model may give accurate results, but this accuracy comes at the cost of a loss of crucial biological knowledge which might help reveal hidden relations among intra/extracellular factors affecting the Wnt pathway.

摘要

在本手稿中，测试了朴素贝叶斯网络中缺失观测值时参数学习的可重复性及其对结直肠癌中Wnt信号激活预测结果的影响。网络训练分别在强力霉素处理的LS174T细胞系（GSE18560）以及正常和腺瘤样本（GSE8671）上进行。设计了一个计算框架来测试参数的可重复性，以检查预测结果的准确性。详细的实验分析表明，预测结果准确且可重复，偏差可忽略不计。由于贝叶斯网络模型的表示问题，估计参数中存在异常。报告了正常（N）、结肠相关腺瘤（AD）、结直肠癌（CRC）、癌（C）、腺癌（ADC）和错配修复缺陷型结直肠癌（RER CRC）测试样本的高预测准确率。炎症性肠病（IBD）的测试样本在预测测试中表现不佳。此外，在证明贝叶斯网络模型不同设计设置的统计显著性时，出现了一个关于假设检验的有趣案例。结果发现，假设检验可能不是检查设计设置之间显著性的正确方法，特别是当模型结构相同时，因为模型是在单条测试数据上训练的。当数据集独立时，显著性检验确实有价值。最后，与生物启发模型相比，朴素贝叶斯模型可能会给出准确的结果，但这种准确性是以损失可能有助于揭示影响Wnt通路的细胞内/外因素之间隐藏关系的关键生物学知识为代价的。