Suppr超能文献

Notos - 一种分析 CpN 观测到的预期比值以推断 DNA 甲基化类型的星系工具。

Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types.

机构信息

Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Str. 47, Greifswald, 17487, Germany.

Theoretical Biology and Biophysics, Group T-6, Los Alamos National Laboratory, New Mexico, Los Alamos, USA.

出版信息

BMC Bioinformatics. 2018 Mar 27;19(1):105. doi: 10.1186/s12859-018-2115-4.

Abstract

BACKGROUND

DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG.

RESULTS

Predominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions without a priori assumptions about the underlying distributions.

CONCLUSIONS

We show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl.

摘要

背景

在绝大多数真核生物中,DNA 甲基化模式存储着表观遗传信息。然而,由于检测 DNA 甲基化的成本相对较高且技术挑战较大,导致针对模式生物的甲基化研究数量存在偏差。因此,要推断关于 DNA 甲基化的功能和进化保守性的普遍规律仍然具有挑战性。甲基化的胞嘧啶通常存在于特定的 CpN 二核苷酸中,并且例如 CpG 观察到/预期(CpG o/e)比值的频率分布已被用于根据甲基化 CpG 的更高突变率来推断 DNA 甲基化类型。

结果

目前主要基于混合高斯分布的基于模型的方法被用于研究与 CpG o/e 比值的模式数量和位置有关的问题。这些方法需要选择适当的标准来确定最佳模型,如果经验分布复杂甚至仅是中度偏斜,则这些方法将失败。我们使用基于核密度估计(KDE)的技术来稳健且精确地描述复杂的 CpN o/e 分布,而无需对基础分布做出先验假设。

结论

我们表明 KDE 提供了 CpN o/e 分布的稳健描述。为了便于处理,我们开发了一个名为 Notos 的 Galaxy 工具,可在 ToolShed 中获得,该工具可计算输入 FASTA 文件的这些比值,并对其经验分布拟合密度。基于估计的密度,确定分布模式的数量和形状,为预测不同甲基化类别的数量和类型提供了合理依据。Notos 是用 R 和 Perl 编写的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4817/5870242/71608a622530/12859_2018_2115_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验