Suppr超能文献

利用转录组学和甲基组学数据的子宫内膜异位症机器学习分类器

Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data.

作者信息

Akter Sadia, Xu Dong, Nagel Susan C, Bromfield John J, Pelch Katherine, Wilshire Gilbert B, Joshi Trupti

机构信息

Informatics Institute, University of Missouri, Columbia, MO, United States.

Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States.

出版信息

Front Genet. 2019 Sep 4;10:766. doi: 10.3389/fgene.2019.00766. eCollection 2019.

Abstract

Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is available, thus leading to an average of 4 to 11 years of diagnostic latency. Discovery of relevant biological patterns from microarray expression or next generation sequencing (NGS) data has been advanced over the last several decades by applying various machine learning tools. We performed machine learning analysis using 38 RNA-seq and 80 enrichment-based DNA methylation (MBD-seq) datasets. We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine, and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: a) implication of three different normalization techniques and b) implication of differential analysis using the generalized linear model (GLM). Several candidate biomarker genes were identified by multiple machine learning experiments including , , , , , and from the transcriptomics data analysis and , , , , , and from the methylomics data analysis. We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.

摘要

子宫内膜异位症是一种复杂且常见的妇科疾病,但人们对其了解甚少。该疾病影响着全球约1.76亿女性,对她们的生活质量和经济负担造成了重大影响。目前既没有明确的临床症状,也没有微创诊断方法,因此诊断潜伏期平均为4至11年。在过去几十年里,通过应用各种机器学习工具,从微阵列表达或下一代测序(NGS)数据中发现相关生物学模式取得了进展。我们使用38个RNA测序数据集和80个基于富集的DNA甲基化(MBD测序)数据集进行了机器学习分析。我们试验了各种监督机器学习方法,如决策树、偏最小二乘判别分析(PLSDA)、支持向量机和随机森林,在根据转录组学和甲基组学数据训练的对照样本中对子宫内膜异位症进行分类时的表现。从两个不同角度进行评估以提高分类性能:a)三种不同归一化技术的影响,b)使用广义线性模型(GLM)进行差异分析的影响。通过多次机器学习实验,从转录组学数据分析中鉴定出了几个候选生物标志物基因,包括 、 、 、 、 、 ,从甲基组学数据分析中鉴定出了候选生物标志物基因 、 、 、 、 、 。我们得出结论,用于子宫内膜异位症的合适机器学习诊断流程应使用TMM归一化处理转录组学数据,使用分位数或voom归一化处理甲基组学数据,使用GLM进行特征空间缩减和分类性能最大化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/6737999/37c4ccd6c776/fgene-10-00766-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验