Suppr超能文献

全面评估神经发育障碍 (NDD) 诊断中 epi 特征的实施情况。

Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs).

机构信息

Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium.

Center for Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium.

出版信息

Hum Genet. 2023 Dec;142(12):1721-1735. doi: 10.1007/s00439-023-02609-2. Epub 2023 Oct 27.

Abstract

Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.

摘要

特征签名是诊断罕见神经发育障碍的常用工具。它们通常基于一组差异甲基化的 CpG,结合支持向量机模型使用。DNA 甲基化(DNAm)数据由于数据生成技术和批次效应的变化,通常包含缺失值。虽然有许多用于 DNAm 数据的归一化方法,但它们对特征签名性能的影响从未被评估过。此外,用于量化 DNAm 的技术发展迅速,这可能导致在已过时的阵列版本上生成的现有特征签名在新的版本中无法很好地转换。实际上,阵列版本、技术或预处理过程中的探针去除会导致缺失值。因此,必须仔细评估缺失数据对特征签名性能的影响,并通过插补或创新的特征签名设计方法来解决。在本文中,我们使用来自患有歌舞伎综合征和 Sotos 综合征的患者的数据,评估归一化方法、分类模型和缺失数据对两个现有特征签名预测性能的影响。我们比较了六种常用的甲基化芯片数据归一化方法如何影响歌舞伎综合征和 Sotos 综合征的特征签名分类性能,并提供了在构建新的特征签名时的最佳实践建议。在这种情况下,我们表明与 Quantile、Raw 和 Swan 归一化方法相比,Illumina、Noob 或 Funnorm 归一化方法在测试集中实现了更高的分类性能。我们进一步表明,惩罚逻辑回归和支持向量机在歌舞伎综合征和 Sotos 综合征患者的分类中表现最佳。然后,我们描述了一种基于差异甲基化区域(DMR)检测的构建特征签名的新范例,并在存在缺失数据的情况下,与基于经典差异甲基化胞嘧啶(DMC)的特征签名的性能进行了比较。我们表明,在存在缺失数据的情况下,经典基于 DMC 的特征签名的性能比基于 DMR 的方法更受影响。我们使用三种流行的分类模型全面评估了 DNA 甲基化数据的归一化如何影响特征签名性能。我们进一步评估了缺失数据如何影响这些模型的预测。最后,我们提出了一种基于差异甲基化区域识别的开发特征签名的新方法,并表明在存在缺失数据的情况下,该方法的性能略优于经典特征签名。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1102/10676303/a1f41c809616/439_2023_2609_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验