• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于 DNA 序列信息和理化性质的新型 DNA 甲基化位点检测计算方法。

A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.

机构信息

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.

Tianjin University Institute of Computational Biology, Tianjin University, Tianjin 300350, China.

出版信息

Int J Mol Sci. 2018 Feb 8;19(2):511. doi: 10.3390/ijms19020511.

DOI:10.3390/ijms19020511
PMID:29419752
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5855733/
Abstract

DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use -gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

摘要

DNA 甲基化是一个重要的生化过程,它与许多类型的癌症密切相关。研究 DNA 甲基化可以帮助我们了解调控机制和表观遗传重编程。因此,识别 DNA 序列中的甲基化位点变得非常重要。在过去的几十年中,随着高通量测序技术在研究和工业中的广泛应用,许多计算方法——特别是机器学习方法——已经被开发出来。为了准确识别特定 DNA 序列背景下核苷酸残基是否被甲基化,我们提出了一种新的方法,克服了以前用于预测甲基化位点的方法的缺点。我们使用 -gram、多元互信息、离散小波变换和伪氨基酸组成来提取特征,并训练稀疏贝叶斯学习模型来进行 DNA 甲基化预测。使用五个标准——接受者操作特征曲线下的面积 (AUC)、马修相关系数 (MCC)、准确性 (ACC)、敏感性 (SN) 和特异性 (SP)——来评估我们方法的预测结果。在基准数据集上,我们在 AUC 上达到了 0.8632,在 ACC 上达到了 0.8017,在 MCC 上达到了 0.5558,在 SN 上达到了 0.7268。此外,在两个 scBS-seq 分析的小鼠胚胎干细胞数据集上,AUC 的最佳结果分别为 0.8896 和 0.9511。与其他优秀方法相比,我们的方法在预测精度上优于它们。与其他方法相比,我们的方法在 AUC 上的提高至少为 0.0399。为了方便其他研究人员,我们的代码已经上传到文件托管服务,可以从以下网址下载:https://figshare.com/s/0697b692d802861282d3。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/f23f426c0af8/ijms-19-00511-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/3d696e476525/ijms-19-00511-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/5b1a4eca6e55/ijms-19-00511-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/4c2f7180bd53/ijms-19-00511-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/b16722dca4e3/ijms-19-00511-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/d10752505cf6/ijms-19-00511-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/6f9795d4a44b/ijms-19-00511-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/f23f426c0af8/ijms-19-00511-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/3d696e476525/ijms-19-00511-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/5b1a4eca6e55/ijms-19-00511-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/4c2f7180bd53/ijms-19-00511-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/b16722dca4e3/ijms-19-00511-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/d10752505cf6/ijms-19-00511-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/6f9795d4a44b/ijms-19-00511-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fd1/5855733/f23f426c0af8/ijms-19-00511-g007.jpg

相似文献

1
A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.一种基于 DNA 序列信息和理化性质的新型 DNA 甲基化位点检测计算方法。
Int J Mol Sci. 2018 Feb 8;19(2):511. doi: 10.3390/ijms19020511.
2
iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC.iLM-2L:一种通过将K间隔氨基酸对纳入周氏广义伪氨基酸组成来识别蛋白质赖氨酸甲基化位点及其甲基化程度的两级预测器。
J Theor Biol. 2015 Nov 21;385:50-7. doi: 10.1016/j.jtbi.2015.07.030. Epub 2015 Aug 4.
3
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。
BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.
4
Identification of DNA-binding and protein-binding proteins using enhanced graph wavelet features.利用增强图小波特征鉴定 DNA 结合蛋白和蛋白质结合蛋白。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):1017-31. doi: 10.1109/TCBB.2013.117.
5
Detecting Succinylation sites from protein sequences using ensemble support vector machine.基于集成支持向量机从蛋白质序列中检测琥珀酰化位点。
BMC Bioinformatics. 2018 Jun 25;19(1):237. doi: 10.1186/s12859-018-2249-4.
6
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC:一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。
J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.
7
Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.
8
Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots.结合伪二核苷酸组成与Z曲线方法提高DNA元件预测准确性:以重组位点为例
Mol Biosyst. 2016 Aug 16;12(9):2893-900. doi: 10.1039/c6mb00374e.
9
Predicting methylation status of human DNA sequences by pseudo-trinucleotide composition.基于伪三核苷酸组成预测人类 DNA 序列的甲基化状态。
Talanta. 2011 Aug 15;85(2):1143-7. doi: 10.1016/j.talanta.2011.05.043. Epub 2011 May 27.
10
Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC.利用有偏支持向量机并将四种不同序列特征纳入周氏伪氨基酸组成对赖氨酸丙酰化位点进行预测。
J Mol Graph Model. 2017 Sep;76:356-363. doi: 10.1016/j.jmgm.2017.07.022. Epub 2017 Jul 25.

引用本文的文献

1
Single-Cell Transcriptomic Approaches for Decoding Non-Coding RNA Mechanisms in Colorectal Cancer.用于解码结直肠癌中非编码RNA机制的单细胞转录组学方法
Noncoding RNA. 2025 Mar 10;11(2):24. doi: 10.3390/ncrna11020024.
2
Advancing epigenetic profiling in cervical cancer: machine learning techniques for classifying DNA methylation patterns.宫颈癌中表观遗传谱分析的进展:用于分类DNA甲基化模式的机器学习技术
3 Biotech. 2024 Nov;14(11):264. doi: 10.1007/s13205-024-04107-2. Epub 2024 Oct 9.
3
Methods in DNA methylation array dataset analysis: A review.

本文引用的文献

1
Impact of Natural Compounds on DNA Methylation Levels of the Tumor Suppressor Gene RASSF1A in Cancer.天然化合物对肿瘤抑制基因 RASSF1A 的 DNA 甲基化水平的影响在癌症中。
Int J Mol Sci. 2017 Oct 17;18(10):2160. doi: 10.3390/ijms18102160.
2
Improved detection of DNA-binding proteins via compression technology on PSSM information.通过基于位置特异性得分矩阵(PSSM)信息的压缩技术改进DNA结合蛋白的检测。
PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017.
3
An Ameliorated Prediction of Drug-Target Interactions Based on Multi-Scale Discrete Wavelet Transform and Network Features.
DNA甲基化阵列数据集分析方法:综述
Comput Struct Biotechnol J. 2024 May 17;23:2304-2325. doi: 10.1016/j.csbj.2024.05.015. eCollection 2024 Dec.
4
iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data.iCpG-Pos:一种使用单细胞全基因组序列数据上的位置特征来识别 CpG 位点的准确计算方法。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad474.
5
Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides.利用随机森林识别氨基酸的关键物理化学性质,以区分抗癌肽和非抗癌肽。
Int J Mol Sci. 2023 Jun 29;24(13):10854. doi: 10.3390/ijms241310854.
6
DNA Methylation and Non-Coding RNAs during Tissue-Injury Associated Pain.组织损伤相关疼痛过程中的 DNA 甲基化和非编码 RNA。
Int J Mol Sci. 2022 Jan 11;23(2):752. doi: 10.3390/ijms23020752.
7
Artificial Intelligence, Machine Learning and Deep Learning in Ion Channel Bioinformatics.离子通道生物信息学中的人工智能、机器学习与深度学习
Membranes (Basel). 2021 Aug 31;11(9):672. doi: 10.3390/membranes11090672.
8
Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning.通过机器学习鉴定聚磷菌的基因组序列
Front Cell Dev Biol. 2021 Jan 18;8:626221. doi: 10.3389/fcell.2020.626221. eCollection 2020.
9
ncPro-ML: An integrated computational tool for identifying non-coding RNA promoters in multiple species.ncPro-ML:一种用于识别多种物种中非编码RNA启动子的综合计算工具。
Comput Struct Biotechnol J. 2020 Sep 10;18:2445-2452. doi: 10.1016/j.csbj.2020.09.001. eCollection 2020.
10
DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning.DNC4mC-Deep:基于深度学习的不同编码方案识别和分析 DNA N4-甲基胞嘧啶位点。
Cells. 2020 Jul 22;9(8):1756. doi: 10.3390/cells9081756.
基于多尺度离散小波变换和网络特征的药物-靶点相互作用的改进预测
Int J Mol Sci. 2017 Aug 16;18(8):1781. doi: 10.3390/ijms18081781.
4
A refined DNA methylation detection method using MspJI coupled quantitative PCR.一种使用MspJI与定量PCR相结合的优化DNA甲基化检测方法。
Anal Biochem. 2017 Sep 15;533:1-9. doi: 10.1016/j.ab.2017.06.006. Epub 2017 Jun 15.
5
DIRECTION: a machine learning framework for predicting and characterizing DNA methylation and hydroxymethylation in mammalian genomes.一种用于预测和描述哺乳动物基因组中 DNA 甲基化和羟甲基化的机器学习框架。
Bioinformatics. 2017 Oct 1;33(19):2986-2994. doi: 10.1093/bioinformatics/btx316.
6
DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning.DeepCpG:利用深度学习准确预测单细胞DNA甲基化状态
Genome Biol. 2017 Apr 11;18(1):67. doi: 10.1186/s13059-017-1189-z.
7
Genetic Variants and Multiple Sclerosis Risk Gene SLC9A9 Expression in Distinct Human Brain Regions.遗传变异与多发性硬化风险基因 SLC9A9 在不同人脑区域的表达。
Mol Neurobiol. 2017 Nov;54(9):6820-6826. doi: 10.1007/s12035-016-0208-5. Epub 2016 Oct 20.
8
Predicting protein-protein interactions via multivariate mutual information of protein sequences.通过蛋白质序列的多变量互信息预测蛋白质-蛋白质相互作用。
BMC Bioinformatics. 2016 Sep 27;17(1):398. doi: 10.1186/s12859-016-1253-9.
9
Integrating genome-wide association studies and gene expression data highlights dysregulated multiple sclerosis risk pathways.整合全基因组关联研究和基因表达数据凸显了多发性硬化症风险通路的失调。
Mult Scler. 2017 Feb;23(2):205-212. doi: 10.1177/1352458516649038. Epub 2016 Jul 11.
10
MethPat: a tool for the analysis and visualisation of complex methylation patterns obtained by massively parallel sequencing.MethPat:一种用于分析和可视化通过大规模平行测序获得的复杂甲基化模式的工具。
BMC Bioinformatics. 2016 Feb 24;17:98. doi: 10.1186/s12859-016-0950-8.