Suppr超能文献

DNA k- -mer 的量子力学电子和几何参数作为机器学习的特征。

Quantum mechanical electronic and geometric parameters for DNA k-mers as features for machine learning.

机构信息

MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK.

出版信息

Sci Data. 2024 Aug 22;11(1):911. doi: 10.1038/s41597-024-03772-5.

Abstract

We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection of four, A, T, G and C, letters, dismissing the past advancements in science that can enable the use of more intricate information from nucleic acid sequences. Here, we provide a comprehensive database of quantum mechanical (QM) and geometric features for all the permutations of 7-meric DNA in their representative B, A and Z conformations. The database is generated by employing the applicable high-cost and time-consuming QM methodologies. This can thus make it seamless to associate a wealth of novel molecular features to any DNA sequence, by scanning it with a matching k-meric window and pulling the pre-computed values from our database for further use in modelling. We demonstrate the usefulness of our deposited features through their exclusive use in developing a model for A->C mutation rates.

摘要

我们正在见证基因组学中模型开发计划的急剧增加,这些计划采用了高端机器学习方法。特别有趣的是那些仅基于 DNA 序列预测某些基因组特征的模型。然而,这些模型将 DNA 仅仅视为 A、T、G 和 C 这四个字母的简单集合,忽略了过去在科学上的进步,这些进步可以利用核酸序列中更复杂的信息。在这里,我们提供了一个全面的量子力学(QM)和几何特征数据库,用于代表 B、A 和 Z 构象的所有 7 聚体 DNA 的排列。该数据库是通过应用高成本和耗时的 QM 方法生成的。因此,可以通过使用匹配的 k 聚体窗口扫描任何 DNA 序列,并从我们的数据库中提取预先计算的值,以便在建模中进一步使用,从而将丰富的新型分子特征无缝地关联到任何 DNA 序列。我们通过仅使用我们存储的特征来开发 A->C 突变率模型来证明其有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8232/11341866/b8ba76b3da31/41597_2024_3772_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验