Suppr超能文献

基于 L1/2 惩罚的高维 DNA 甲基化数据惩罚逻辑回归。

Penalized logistic regression based on L1/2 penalty for high-dimensional DNA methylation data.

出版信息

Technol Health Care. 2020;28(S1):161-171. doi: 10.3233/THC-209016.

Abstract

BACKGROUND

DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5'-C-phosphate-G-3'(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific methods. CpG sites are highly correlated with each other within the same gene or the same CpG island.

OBJECTIVE

Based on this group effect, we proposed an efficient and accurate method for selecting pathogenic CpG sites.

METHODS

Our method aimed to combine a L1/2 regularized solver and a central node fully connected network to penalize group constrained logistic regression model. Consequently, both sparsity and group effect were brought in with respect to the correlated regression coefficients.

RESULTS

Extensive simulation studies were used to compare our proposed approach with existing mainstream regularization in respect of classification accuracy and stability. The simulation results show that a greater predictive accuracy was attained in comparison to previous methods. Furthermore, our method was applied to over 20000 CpG sites and verified using the ovarian cancer data generated from Illumina Infinium HumanMethylation 27K Beadchip. In the result of the real dataset, not only the indicators of predictive accuracy are higher than the previous methods, but also more CpG sites containing genes are confirmed pathogenic. Additionally, the total number of CpG sites chosen is less than other methods and the results show higher accuracy rates in comparison to other methods in simulation and DNA methylation data.

CONCLUSION

The proposed method offers an advanced tool to researchers in DNA methylation and can be a powerful tool for recognizing pathogenic CpG sites.

摘要

背景

DNA 甲基化是一种 DNA 的分子修饰,对基因表达至关重要。在癌症组织中,富含 5'-C-磷酸-G-3'(CpG)的区域异常过度甲基化或低甲基化。因此,通过采用特定方法找出病变的 CpG 位点是很有用的。CpG 位点在同一基因或同一 CpG 岛内彼此高度相关。

目的

基于这种组效应,我们提出了一种有效且准确的选择致病 CpG 位点的方法。

方法

我们的方法旨在结合 L1/2 正则化求解器和全连接中心节点网络,惩罚组约束逻辑回归模型。因此,与相关回归系数相关联,同时实现了稀疏性和组效应。

结果

通过广泛的模拟研究,将我们提出的方法与现有的主流正则化方法在分类准确性和稳定性方面进行了比较。模拟结果表明,与之前的方法相比,我们的方法获得了更高的预测准确性。此外,我们的方法还应用于超过 20000 个 CpG 位点,并使用从 Illumina Infinium HumanMethylation 27K Beadchip 生成的卵巢癌数据进行了验证。在真实数据集的结果中,不仅预测准确性的指标高于之前的方法,而且还确认了更多包含基因的 CpG 位点具有致病性。此外,所选 CpG 位点的总数少于其他方法,并且与模拟和 DNA 甲基化数据中的其他方法相比,结果显示出更高的准确率。

结论

所提出的方法为 DNA 甲基化研究人员提供了一种先进的工具,可以成为识别致病 CpG 位点的有力工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验