• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Ideafix:一种基于决策树的方法,用于优化福尔马林固定石蜡包埋(FFPE)DNA测序数据中的变异。

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data.

作者信息

Tellaetxe-Abete Maitena, Calvo Borja, Lawrie Charles

机构信息

Molecular Oncology Group, Biodonostia Health Research Institute, Paseo Doctor Begiristain, 20014 Donostia/San Sebastian, Spain.

Intelligent Systems Group, Computer Science Faculty, University of the Basque Country, Paseo Manuel Lardizabal, 20018 Donostia/San Sebastian, Spain.

出版信息

NAR Genom Bioinform. 2021 Oct 27;3(4):lqab092. doi: 10.1093/nargab/lqab092. eCollection 2021 Dec.

DOI:10.1093/nargab/lqab092
PMID:34729472
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8557387/
Abstract

Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1 600 000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting) and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.

摘要

越来越多癌症患者的治疗决策是基于福尔马林固定石蜡包埋(FFPE)活检产生的下一代测序结果做出的。然而,这种材料容易出现难以轻易识别的序列假象。为了解决这个问题,我们设计了一种基于机器学习的算法,利用来自27对FFPE和新鲜冷冻乳腺癌样本的超过160万个变体数据来识别这些假象。利用这些数据,我们组装了一系列变体特征,并评估了五种机器学习算法的分类性能。使用留一法交叉验证,我们发现XGBoost(极端梯度提升)和随机森林获得的AUC(受试者工作特征曲线下面积)值>0.86。使用两个独立数据集进一步测试性能,得到的AUC值为0.96,而与先前发表的工具进行比较,得到的最大AUC值为0.92。最具区分性的特征是读段对方向偏差、基因组背景和变异等位基因频率。总之,我们的结果表明这些样本在分子检测中的应用前景广阔。我们将该算法构建到一个名为Ideafix(脱氨基修复)的R包中,可在https://github.com/mmaitenat/ideafix上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/6fc40c042386/lqab092fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/7172c1c6ebe8/lqab092fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/8698525c76bb/lqab092fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/9b912c5a14e3/lqab092fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/9305112f4409/lqab092fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/629b23d28d3c/lqab092fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/0a818fdd9c15/lqab092fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/6fc40c042386/lqab092fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/7172c1c6ebe8/lqab092fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/8698525c76bb/lqab092fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/9b912c5a14e3/lqab092fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/9305112f4409/lqab092fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/629b23d28d3c/lqab092fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/0a818fdd9c15/lqab092fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea4/8557387/6fc40c042386/lqab092fig7.jpg

相似文献

1
Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data.Ideafix:一种基于决策树的方法,用于优化福尔马林固定石蜡包埋(FFPE)DNA测序数据中的变异。
NAR Genom Bioinform. 2021 Oct 27;3(4):lqab092. doi: 10.1093/nargab/lqab092. eCollection 2021 Dec.
2
Bioinformatics and DNA-extraction strategies to reliably detect genetic variants from FFPE breast tissue samples.从 FFPE 乳腺组织样本中可靠检测遗传变异的生物信息学和 DNA 提取策略。
BMC Genomics. 2019 Sep 2;20(1):689. doi: 10.1186/s12864-019-6056-8.
3
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
4
Comparison of whole-exome sequencing of matched fresh and formalin fixed paraffin embedded melanoma tumours: implications for clinical decision making.配对新鲜和福尔马林固定石蜡包埋黑色素瘤肿瘤的全外显子组测序比较:对临床决策的意义
Pathology. 2016 Apr;48(3):261-6. doi: 10.1016/j.pathol.2016.01.001. Epub 2016 Mar 9.
5
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
6
Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data.用于从福尔马林固定石蜡包埋基因组序列数据中改进体细胞变异检测的组合和机器学习方法。
Front Genet. 2022 Apr 27;13:834764. doi: 10.3389/fgene.2022.834764. eCollection 2022.
7
Whole exome sequencing (WES) on formalin-fixed, paraffin-embedded (FFPE) tumor tissue in gastrointestinal stromal tumors (GIST).对胃肠道间质瘤(GIST)的福尔马林固定、石蜡包埋(FFPE)肿瘤组织进行全外显子组测序(WES)。
BMC Genomics. 2015 Nov 3;16:892. doi: 10.1186/s12864-015-1982-6.
8
Using postmortem formalin fixed paraffin-embedded tissues for molecular testing of sudden cardiac death: A cautionary tale of utility and limitations.利用死后福尔马林固定石蜡包埋组织进行心源性猝死的分子检测:实用性和局限性的警示故事。
Forensic Sci Int. 2020 Mar;308:110177. doi: 10.1016/j.forsciint.2020.110177. Epub 2020 Jan 30.
9
SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples.SMIXnorm:用于福尔马林固定石蜡包埋样本的快速准确RNA测序数据标准化方法
Front Genet. 2021 Mar 24;12:650795. doi: 10.3389/fgene.2021.650795. eCollection 2021.
10
Performance comparison of three DNA extraction kits on human whole-exome data from formalin-fixed paraffin-embedded normal and tumor samples.三种 DNA 提取试剂盒在福尔马林固定石蜡包埋正常和肿瘤样本人类全外显子数据上的性能比较。
PLoS One. 2018 Apr 5;13(4):e0195471. doi: 10.1371/journal.pone.0195471. eCollection 2018.

引用本文的文献

1
Identification of Somatic Variants in Cancer Genomes from Tissue and Liquid Biopsy Samples.从组织和液体活检样本中鉴定癌症基因组中的体细胞变异
Methods Mol Biol. 2025;2932:291-301. doi: 10.1007/978-1-0716-4566-6_16.
2
Is There a Link between Chronic Obstructive Pulmonary Disease and Lung Adenocarcinoma? A Clinico-Pathological and Molecular Study.慢性阻塞性肺疾病与肺腺癌之间存在关联吗?一项临床病理与分子研究。
J Pers Med. 2024 Aug 8;14(8):839. doi: 10.3390/jpm14080839.
3
DEEPOMICS FFPE, a deep neural network model, identifies DNA sequencing artifacts from formalin fixed paraffin embedded tissue with high accuracy.

本文引用的文献

1
Strand Orientation Bias Detector to determine the probability of FFPE sequencing artifacts.链取向偏差探测器,用于确定 FFPE 测序伪影的概率。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab186.
2
The Mutational Concordance of Fixed Formalin Paraffin Embedded and Fresh Frozen Gastro-Oesophageal Tumours Using Whole Exome Sequencing.使用全外显子组测序分析福尔马林固定石蜡包埋与新鲜冷冻的胃食管肿瘤的突变一致性
J Clin Med. 2021 Jan 9;10(2):215. doi: 10.3390/jcm10020215.
3
FIREVAT: finding reliable variants without artifacts in human cancer samples using etiologically relevant mutational signatures.
DEEPOMICS FFPE 是一种深度神经网络模型,能够高精度地识别福尔马林固定石蜡包埋组织中的 DNA 测序伪影。
Sci Rep. 2024 Jan 31;14(1):2559. doi: 10.1038/s41598-024-53167-0.
4
Cancer Neoantigens: Challenges and Future Directions for Prediction, Prioritization, and Validation.癌症新抗原:预测、优先级排序及验证面临的挑战与未来方向
Front Oncol. 2022 Mar 3;12:836821. doi: 10.3389/fonc.2022.836821. eCollection 2022.
FIREVAT:使用与病因相关的突变特征在人类癌症样本中无伪影地发现可靠变异。
Genome Med. 2019 Dec 17;11(1):81. doi: 10.1186/s13073-019-0695-x.
4
Comparing the performance of selected variant callers using synthetic data and genome segmentation.使用合成数据和基因组分割比较选定变异调用程序的性能。
BMC Bioinformatics. 2018 Nov 19;19(1):429. doi: 10.1186/s12859-018-2440-7.
5
Pisces: an accurate and versatile variant caller for somatic and germline next-generation sequencing data.双鱼座:用于体细胞和种系下一代测序数据的准确且通用的变异调用程序。
Bioinformatics. 2019 May 1;35(9):1579-1581. doi: 10.1093/bioinformatics/bty849.
6
A computational tool to detect DNA alterations tailored to formalin-fixed paraffin-embedded samples in cancer clinical sequencing.一种针对癌症临床测序中福尔马林固定石蜡包埋样本的 DNA 改变的计算工具。
Genome Med. 2018 Jun 7;10(1):44. doi: 10.1186/s13073-018-0547-0.
7
Performance comparison of three DNA extraction kits on human whole-exome data from formalin-fixed paraffin-embedded normal and tumor samples.三种 DNA 提取试剂盒在福尔马林固定石蜡包埋正常和肿瘤样本人类全外显子数据上的性能比较。
PLoS One. 2018 Apr 5;13(4):e0195471. doi: 10.1371/journal.pone.0195471. eCollection 2018.
8
Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project.临床全基因组测序来自常规福尔马林固定、石蜡包埋标本:10 万基因组计划的初步研究。
Genet Med. 2018 Oct;20(10):1196-1205. doi: 10.1038/gim.2017.241. Epub 2018 Feb 1.
9
Prevalence and detection of low-allele-fraction variants in clinical cancer samples.临床癌症样本中低频等位基因变异的流行和检测。
Nat Commun. 2017 Nov 9;8(1):1377. doi: 10.1038/s41467-017-01470-y.
10
Clinical impact of the subclonal architecture and mutational complexity in chronic lymphocytic leukemia.慢性淋巴细胞白血病亚克隆结构和突变复杂性的临床影响。
Leukemia. 2018 Mar;32(3):645-653. doi: 10.1038/leu.2017.291. Epub 2017 Sep 19.