Suppr超能文献

Notos - 一种分析 CpN 观测到的预期比值以推断 DNA 甲基化类型的星系工具。

Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types.

机构信息

Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Str. 47, Greifswald, 17487, Germany.

Theoretical Biology and Biophysics, Group T-6, Los Alamos National Laboratory, New Mexico, Los Alamos, USA.

出版信息

BMC Bioinformatics. 2018 Mar 27;19(1):105. doi: 10.1186/s12859-018-2115-4.

Abstract

BACKGROUND

DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG.

RESULTS

Predominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions without a priori assumptions about the underlying distributions.

CONCLUSIONS

We show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl.

摘要

背景

在绝大多数真核生物中,DNA 甲基化模式存储着表观遗传信息。然而,由于检测 DNA 甲基化的成本相对较高且技术挑战较大,导致针对模式生物的甲基化研究数量存在偏差。因此,要推断关于 DNA 甲基化的功能和进化保守性的普遍规律仍然具有挑战性。甲基化的胞嘧啶通常存在于特定的 CpN 二核苷酸中,并且例如 CpG 观察到/预期(CpG o/e)比值的频率分布已被用于根据甲基化 CpG 的更高突变率来推断 DNA 甲基化类型。

结果

目前主要基于混合高斯分布的基于模型的方法被用于研究与 CpG o/e 比值的模式数量和位置有关的问题。这些方法需要选择适当的标准来确定最佳模型,如果经验分布复杂甚至仅是中度偏斜,则这些方法将失败。我们使用基于核密度估计(KDE)的技术来稳健且精确地描述复杂的 CpN o/e 分布,而无需对基础分布做出先验假设。

结论

我们表明 KDE 提供了 CpN o/e 分布的稳健描述。为了便于处理,我们开发了一个名为 Notos 的 Galaxy 工具,可在 ToolShed 中获得,该工具可计算输入 FASTA 文件的这些比值,并对其经验分布拟合密度。基于估计的密度,确定分布模式的数量和形状,为预测不同甲基化类别的数量和类型提供了合理依据。Notos 是用 R 和 Perl 编写的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4817/5870242/71608a622530/12859_2018_2115_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验