Ideafix：一种基于决策树的方法，用于优化福尔马林固定石蜡包埋（FFPE）DNA测序数据中的变异。

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data.

作者信息

Tellaetxe-Abete Maitena, Calvo Borja, Lawrie Charles

机构信息

Molecular Oncology Group, Biodonostia Health Research Institute, Paseo Doctor Begiristain, 20014 Donostia/San Sebastian, Spain.

Intelligent Systems Group, Computer Science Faculty, University of the Basque Country, Paseo Manuel Lardizabal, 20018 Donostia/San Sebastian, Spain.

出版信息

NAR Genom Bioinform. 2021 Oct 27;3(4):lqab092. doi: 10.1093/nargab/lqab092. eCollection 2021 Dec.

DOI:10.1093/nargab/lqab092

PMID:34729472

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8557387/

Abstract

Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1 600 000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting) and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.

摘要

越来越多癌症患者的治疗决策是基于福尔马林固定石蜡包埋（FFPE）活检产生的下一代测序结果做出的。然而，这种材料容易出现难以轻易识别的序列假象。为了解决这个问题，我们设计了一种基于机器学习的算法，利用来自27对FFPE和新鲜冷冻乳腺癌样本的超过160万个变体数据来识别这些假象。利用这些数据，我们组装了一系列变体特征，并评估了五种机器学习算法的分类性能。使用留一法交叉验证，我们发现XGBoost（极端梯度提升）和随机森林获得的AUC（受试者工作特征曲线下面积）值>0.86。使用两个独立数据集进一步测试性能，得到的AUC值为0.96，而与先前发表的工具进行比较，得到的最大AUC值为0.92。最具区分性的特征是读段对方向偏差、基因组背景和变异等位基因频率。总之，我们的结果表明这些样本在分子检测中的应用前景广阔。我们将该算法构建到一个名为Ideafix（脱氨基修复）的R包中，可在https://github.com/mmaitenat/ideafix上免费获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Ideafix：一种基于决策树的方法，用于优化福尔马林固定石蜡包埋（FFPE）DNA测序数据中的变异。

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

Ideafix：一种基于决策树的方法，用于优化福尔马林固定石蜡包埋（FFPE）DNA测序数据中的变异。

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献