Suppr超能文献

iCpG-Pos:一种使用单细胞全基因组序列数据上的位置特征来识别 CpG 位点的准确计算方法。

iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data.

机构信息

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.

College of Information Technology in the United Arab Emirates University (UAEU), Abu Dhabi 15551, UAE.

出版信息

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad474.

Abstract

MOTIVATION

The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately.

RESULTS

In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers.

AVAILABILITY AND IMPLEMENTATION

The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.

摘要

动机

对 DNA 甲基化的研究可以揭示人类健康的潜在过程,并有助于确定整体人类健康。然而,由于覆盖度不足,实施单链 DNA 甲基化测序技术具有挑战性,这凸显了对高效预测模型的需求。模型需要帮助我们理解潜在的生物系统,并准确预测单细胞(甲基化)数据。

结果

在这项研究中,我们开发了用于预测 CpG 位点的位置特征。序列的位置特征是使用 CpG 区域的数据和附近 CpG 位点之间的分隔推导出来的。评估了多个优化分类器和不同的集成学习方法。使用 OPTUNA 框架来优化算法。CatBoost 算法后面跟着堆叠算法,其表现优于现有的 DNA 甲基化标识符。

可用性和实现

本研究中使用的数据和方法对研究界开放。研究人员可以访问用于预测 CpG 位点甲基化模式的位置特征和算法。为了获得卓越的性能,我们采用了 CatBoost 算法后面跟着堆叠算法,其表现优于现有的 DNA 甲基化标识符。所提出的 iCpG-Pos 方法仅使用位置特征,与其他已知的检测 CpG 位点甲基化模式的方法相比,大大降低了计算复杂度。总之,我们的研究引入了一种新的方法 iCpG-Pos,用于预测 CpG 位点的甲基化模式。通过关注位置特征,我们的模型提供了准确性和效率,这使其成为推进 DNA 甲基化研究及其在人类健康和福祉中的应用的有前途的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c106/10444964/ae42c47186d4/btad474f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验