Suppr超能文献

重尾噪声抑制和导数小波标度图在检测 DNA 拷贝数异常中的应用。

Heavy-Tailed Noise Suppression and Derivative Wavelet Scalogram for Detecting DNA Copy Number Aberrations.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1625-1635. doi: 10.1109/TCBB.2017.2723884. Epub 2017 Jul 6.

Abstract

Most existing array comparative genomic hybridization (array CGH) data processing methods and evaluation models assumed that the probability density function (pdf) of noise in array CGH data is a Gaussian distribution. However, in practice, such noise distribution is peaky and heavy-tailed. Therefore, a Gaussian pdf is not adequate to approximate the noise in array CGH data and hence introduces wrong detections of chromosomal aberrations and leads misunderstanding on disease pathogenesis. A more accurate and sufficient model of noise in array CGH data is necessary and beneficial to the detection of DNA copy number variations. We analyze the real array CGH data from different platforms and show that the distribution of noise in array CGH data is fitted very well by generalized Gaussian distribution (GGD). Based on our new noise model, we propose a novel array CGH processing method combining the advantages of both the smoothing and segmentation approaches. The new method uses generalized Gaussian bivariate shrinkage function and one-directional derivative wavelet scalogram in generalized Gaussian noise. In the smoothing step, with the new generalized Gaussian noise model, we derive the heavy-tailed noise suppression algorithm in stationary wavelet domain. In the segmentation step, the 1D Gaussian derivative wavelet scalogram is employed to detect break points. Both real and simulated array CGH data with different noises (such as Gaussian noise, GGD noise, and real noise) are used in our experiments. We demonstrate that our new method outperforms other state-of-the-art methods, in terms of both root mean squared errors and receiver operating characteristic curves.

摘要

大多数现有的阵列比较基因组杂交(array CGH)数据处理方法和评估模型都假设 array CGH 数据中的噪声概率密度函数(pdf)是高斯分布。然而,在实践中,这种噪声分布是峰值和重尾的。因此,高斯 pdf 不足以近似 array CGH 数据中的噪声,从而导致染色体异常的错误检测,并导致对疾病发病机制的误解。array CGH 数据中噪声的更准确和充分的模型是必要的,并且有利于检测 DNA 拷贝数变异。我们分析了来自不同平台的真实 array CGH 数据,并表明 array CGH 数据中的噪声分布非常适合广义高斯分布(GGD)。基于我们的新噪声模型,我们提出了一种新的 array CGH 处理方法,结合了平滑和分割方法的优点。该新方法使用广义高斯双变量收缩函数和广义高斯噪声中的单向导数小波谱图。在平滑步骤中,使用新的广义高斯噪声模型,我们推导出了平稳小波域中重尾噪声抑制算法。在分割步骤中,使用 1D 高斯导数小波谱图来检测断点。我们在实验中使用了不同噪声(如高斯噪声、GGD 噪声和真实噪声)的真实和模拟 array CGH 数据。结果表明,我们的新方法在均方根误差和接收器工作特征曲线方面都优于其他最先进的方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验