Zomnir Michael G, Lipkin Lev, Pacula Maciej, Meneses Enrique Dominguez, MacLeay Allison, Duraisamy Sekhar, Nadhamuni Nishchal, Al Turki Saeed H, Zheng Zongli, Rivera Miguel, Nardi Valentina, Dias-Santagata Dora, Iafrate A John, Le Long P, Lennerz Jochen K
Massachusetts General Hospital, Boston, MA.
JCO Clin Cancer Inform. 2018;2. doi: 10.1200/CCI.16.00079. Epub 2018 Mar 22.
Next-generation sequencing technologies are actively applied in clinical oncology. Bioinformatics pipeline analysis is an integral part of this process; however, humans cannot yet realize the full potential of the highly complex pipeline output. As a result, the decision to include a variant in the final report during routine clinical sign-out remains challenging.
We used an artificial intelligence approach to capture the collective clinical sign-out experience of six board-certified molecular pathologists to build and validate a decision support tool for variant reporting. We extracted all reviewed and reported variants from our clinical database and tested several machine learning models. We used 10-fold cross-validation for our variant call prediction model, which derives a contiguous prediction score from 0 to 1 (no to yes) for clinical reporting.
For each of the 19,594 initial training variants, our pipeline generates approximately 500 features, which results in a matrix of > 9 million data points. From a comparison of naive Bayes, decision trees, random forests, and logistic regression models, we selected models that allow human interpretability of the prediction score. The logistic regression model demonstrated 1% false negativity and 2% false positivity. The final models' Youden indices were 0.87 and 0.77 for screening and confirmatory cutoffs, respectively. Retraining on a new assay and performance assessment in 16,123 independent variants validated our approach (Youden index, 0.93). We also derived individual pathologist-centric models (virtual consensus conference function), and a visual drill-down functionality allows assessment of how underlying features contributed to a particular score or decision branch for clinical implementation.
Our decision support tool for variant reporting is a practically relevant artificial intelligence approach to harness the next-generation sequencing bioinformatics pipeline output when the complexity of data interpretation exceeds human capabilities.
新一代测序技术正在临床肿瘤学中积极应用。生物信息学流程分析是这一过程的一个组成部分;然而,人类尚未充分发挥高度复杂的流程输出的全部潜力。因此,在常规临床签出时决定将某个变异纳入最终报告仍然具有挑战性。
我们采用人工智能方法,收集六位获得委员会认证的分子病理学家的集体临床签出经验,以构建和验证一个变异报告决策支持工具。我们从临床数据库中提取了所有经过审查和报告的变异,并测试了几种机器学习模型。我们对变异调用预测模型使用10折交叉验证,该模型为临床报告得出一个从0到1(否到是)的连续预测分数。
对于19594个初始训练变异中的每一个,我们的流程生成大约500个特征,从而形成一个超过900万个数据点的矩阵。通过对朴素贝叶斯、决策树、随机森林和逻辑回归模型的比较,我们选择了能够让人对预测分数进行解释的模型。逻辑回归模型显示假阴性率为1%,假阳性率为2%。最终模型在筛查和确认性临界值时的约登指数分别为0.87和0.77。在16123个独立变异上进行新检测的再训练和性能评估验证了我们的方法(约登指数为0.93)。我们还推导了以个体病理学家为中心的模型(虚拟共识会议功能),并且一个可视化深入分析功能允许评估基础特征如何对临床实施的特定分数或决策分支产生影响。
当数据解释的复杂性超出人类能力时,我们的变异报告决策支持工具是一种切实可行的人工智能方法,可用于利用新一代测序生物信息学流程输出。