利用生物实验数据和分子动力学，通过机器学习对突变热点进行分类。

Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.

作者信息

Davies James G, Menzies Georgina E

机构信息

Molecular Bioscience Division, School of Biosciences, Cardiff University, Cardiff, CF10 3AX, United Kingdom.

出版信息

Bioinform Adv. 2024 Aug 26;4(1):vbae125. doi: 10.1093/bioadv/vbae125. eCollection 2024.

DOI:10.1093/bioadv/vbae125

PMID:39239360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11377099/

Abstract

MOTIVATION

Benzo[]pyrene, a notorious DNA-damaging carcinogen, belongs to the family of polycyclic aromatic hydrocarbons commonly found in tobacco smoke. Surprisingly, nucleotide excision repair (NER) machinery exhibits inefficiency in recognizing specific bulky DNA adducts including Benzo[]pyrene Diol-Epoxide (BPDE), a Benzo[]pyrene metabolite. While sequence context is emerging as the leading factor linking the inadequate NER response to BPDE adducts, the precise structural attributes governing these disparities remain inadequately understood. We therefore combined the domains of molecular dynamics and machine learning to conduct a comprehensive assessment of helical distortion caused by BPDE-Guanine adducts in multiple gene contexts. Specifically, we implemented a dual approach involving a random forest classification-based analysis and subsequent feature selection to identify precise topological features that may distinguish adduct sites of variable repair capacity. Our models were trained using helical data extracted from duplexes representing both BPDE hotspot and nonhotspot sites within the gene, then applied to sites within , , and genes.

RESULTS

We show our optimized model consistently achieved exceptional performance, with accuracy, precision, and f1 scores exceeding 91%. Our feature selection approach uncovered that discernible variance in regional base pair rotation played a pivotal role in informing the decisions of our model. Notably, these disparities were highly conserved among and duplexes and appeared to be influenced by the regional GC content. As such, our findings suggest that there are indeed conserved topological features distinguishing hotspots and nonhotpot sites, highlighting regional GC content as a potential biomarker for mutation.

AVAILABILITY AND IMPLEMENTATION

Code for comparing machine learning classifiers and evaluating their performance is available at https://github.com/jdavies24/ML-Classifier-Comparison, and code for analysing DNA structure with Curves+ and Canal using Random Forest is available at https://github.com/jdavies24/ML-classification-of-DNA-trajectories.

摘要

动机

苯并[a]芘是一种臭名昭著的DNA损伤致癌物，属于多环芳烃家族，常见于烟草烟雾中。令人惊讶的是，核苷酸切除修复（NER）机制在识别包括苯并[a]芘二醇环氧化物（BPDE，一种苯并[a]芘代谢物）在内的特定大体积DNA加合物时效率低下。虽然序列背景正成为将NER对BPDE加合物反应不足联系起来的主要因素，但控制这些差异的精确结构属性仍未得到充分理解。因此，我们结合分子动力学和机器学习领域，对多个基因背景下BPDE-鸟嘌呤加合物引起的螺旋扭曲进行了全面评估。具体而言，我们实施了一种双重方法，包括基于随机森林分类的分析和随后的特征选择，以识别可能区分具有可变修复能力的加合物位点的精确拓扑特征。我们的模型使用从代表基因内BPDE热点和非热点位点的双链体中提取的螺旋数据进行训练，然后应用于基因、和中的位点。

结果

我们表明，我们优化后的模型始终表现出卓越的性能，准确率、精确率和F1分数均超过91%。我们的特征选择方法发现，区域碱基对旋转的可辨别差异在为我们的模型决策提供信息方面起着关键作用。值得注意的是，这些差异在和双链体中高度保守，并且似乎受到区域GC含量的影响。因此，我们的研究结果表明，确实存在区分热点和非热点位点的保守拓扑特征，突出了区域GC含量作为突变潜在生物标志物的作用。

可用性和实现方式

用于比较机器学习分类器并评估其性能的代码可在https://github.com/jdavies24/ML-Classifier-Comparison上获取，使用随机森林通过Curves+和Canal分析DNA结构的代码可在https://github.com/jdavies24/ML-classification-of-DNA-trajectories上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e82d/11377099/782e526e4cdb/vbae125f1.jpg

相似文献

Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.利用生物实验数据和分子动力学，通过机器学习对突变热点进行分类。

Bioinform Adv. 2024 Aug 26;4(1):vbae125. doi: 10.1093/bioadv/vbae125. eCollection 2024.

Base damage, local sequence context and TP53 mutation hotspots: a molecular dynamics study of benzo[a]pyrene induced DNA distortion and mutability.碱基损伤、局部序列背景与TP53突变热点：苯并[a]芘诱导DNA畸变与突变性的分子动力学研究

Nucleic Acids Res. 2015 Oct 30;43(19):9133-46. doi: 10.1093/nar/gkv910. Epub 2015 Sep 22.

DNA adducts from a tumorigenic metabolite of benzo[a]pyrene block human RNA polymerase II elongation in a sequence- and stereochemistry-dependent manner.来自苯并[a]芘致瘤代谢物的DNA加合物以序列和立体化学依赖性方式阻断人类RNA聚合酶II的延伸。

J Mol Biol. 2002 Aug 2;321(1):29-47. doi: 10.1016/s0022-2836(02)00593-4.

Genetic polymorphisms in 19q13.3 genes associated with alteration of repair capacity to BPDE-DNA adducts in primary cultured lymphocytes.19q13.3基因中的遗传多态性与原代培养淋巴细胞中BPDE-DNA加合物修复能力的改变相关。

Mutat Res Genet Toxicol Environ Mutagen. 2016 Dec;812:39-47. doi: 10.1016/j.mrgentox.2016.10.004. Epub 2016 Oct 29.

TP53 mutations induced by BPDE in Xpa-WT and Xpa-Null human TP53 knock-in (Hupki) mouse embryo fibroblasts.BPDE在Xpa野生型和Xpa缺失型人TP53基因敲入（Hupki）小鼠胚胎成纤维细胞中诱导产生的TP53突变

Mutat Res. 2015 Mar;773:48-62. doi: 10.1016/j.mrfmmm.2015.01.013. Epub 2015 Jan 30.

Differential removal of DNA adducts derived from anti-diol epoxides of dibenzo[a,l]pyrene and benzo[a]pyrene in human cells.人细胞中源自二苯并[a,l]芘和苯并[a]芘反式二醇环氧化物的DNA加合物的差异去除

Chem Res Toxicol. 2005 Apr;18(4):655-64. doi: 10.1021/tx0497090.

Base pair conformation-dependent excision of benzo[a]pyrene diol epoxide-guanine adducts by human nucleotide excision repair enzymes.人核苷酸切除修复酶对苯并[a]芘二醇环氧化物-鸟嘌呤加合物的碱基对构象依赖性切除

Mol Cell Biol. 1997 Dec;17(12):7069-76. doi: 10.1128/MCB.17.12.7069.

Evading the proofreading machinery of a replicative DNA polymerase: induction of a mutation by an environmental carcinogen.逃避复制性DNA聚合酶的校对机制：环境致癌物诱发的突变

J Mol Biol. 2001 Jun 1;309(2):519-36. doi: 10.1006/jmbi.2001.4674.

Formation of benzo[a]pyrene diol epoxide-DNA adducts at specific guanines within K-ras and p53 gene sequences: stable isotope-labeling mass spectrometry approach.在K-ras和p53基因序列内特定鸟嘌呤处形成苯并[a]芘二醇环氧化物-DNA加合物：稳定同位素标记质谱法

Biochemistry. 2002 Jul 30;41(30):9535-44. doi: 10.1021/bi025540i.

Formation of diastereomeric benzo[a]pyrene diol epoxide-guanine adducts in p53 gene-derived DNA sequences.在p53基因衍生的DNA序列中形成非对映体苯并[a]芘二醇环氧化物-鸟嘌呤加合物。

Chem Res Toxicol. 2004 Jun;17(6):731-41. doi: 10.1021/tx049974l.

引用本文的文献

Using Machine Learning to Analyze Molecular Dynamics Simulations of Biomolecules.利用机器学习分析生物分子的分子动力学模拟

J Phys Chem B. 2025 Jun 5;129(22):5375-5385. doi: 10.1021/acs.jpcb.4c08824. Epub 2025 May 27.

本文引用的文献

AmberTools. AmberTools。

J Chem Inf Model. 2023 Oct 23;63(20):6183-6191. doi: 10.1021/acs.jcim.3c01153. Epub 2023 Oct 8.

Carcinogen-induced DNA structural distortion differences in the RAS gene isoforms; the importance of local sequence.致癌物诱导的RAS基因亚型中DNA结构畸变差异；局部序列的重要性。

BMC Chem. 2021 Sep 14;15(1):51. doi: 10.1186/s13065-021-00777-8.

Impact of DNA sequences on DNA 'opening' by the Rad4/XPC nucleotide excision repair complex.DNA 序列对 Rad4/XPC 核苷酸切除修复复合物打开 DNA 的影响。

DNA Repair (Amst). 2021 Nov;107:103194. doi: 10.1016/j.dnarep.2021.103194. Epub 2021 Jul 29.

Risk of breast cancer associated with long-term exposure to benzo[a]pyrene (BaP) air pollution: Evidence from the French E3N cohort study.长期接触苯并[a]芘（BaP）空气污染与乳腺癌风险相关：来自法国 E3N 队列研究的证据。

Environ Int. 2021 Apr;149:106399. doi: 10.1016/j.envint.2021.106399. Epub 2021 Jan 24.

Base-Pairing and Base-Stacking Contributions to Double-Stranded DNA Formation.碱基配对和碱基堆积对双链 DNA 形成的贡献。

J Phys Chem B. 2020 Nov 19;124(46):10345-10352. doi: 10.1021/acs.jpcb.0c07670. Epub 2020 Nov 6.

Delving into Eukaryotic Origins of Replication Using DNA Structural Features.利用DNA结构特征深入探究真核生物复制起点

ACS Omega. 2020 Jun 1;5(23):13601-13611. doi: 10.1021/acsomega.0c00441. eCollection 2020 Jun 16.

Atomic Charge Calculator II: web-based tool for the calculation of partial atomic charges.原子电荷计算器 II：用于计算部分原子电荷的网络工具。

Nucleic Acids Res. 2020 Jul 2;48(W1):W591-W596. doi: 10.1093/nar/gkaa367.

Lung cancer risk assessment for workers exposed to polycyclic aromatic hydrocarbons in various industries.不同行业接触多环芳烃工人的肺癌风险评估。

Environ Int. 2019 Mar;124:109-120. doi: 10.1016/j.envint.2018.12.058. Epub 2019 Jan 11.

Flexibility and structure of flanking DNA impact transcription factor affinity for its core motif.侧翼 DNA 的灵活性和结构影响转录因子与其核心基序的亲和力。

Nucleic Acids Res. 2018 Dec 14;46(22):11883-11897. doi: 10.1093/nar/gky1057.

Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis.原肠胚形成前期和神经发生期人类细胞中不同的突变率和机制。

Science. 2018 Feb 2;359(6375):550-555. doi: 10.1126/science.aan8690. Epub 2017 Dec 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用生物实验数据和分子动力学，通过机器学习对突变热点进行分类。

Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献