用于复杂疾病研究的多组学数据模拟器及其在评估疾病分类的多组学数据分析方法中的应用。

A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification.

机构信息

Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, No. 35, Keyan Road, Zhunan, 350, Taiwan.

出版信息

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz045.

DOI:10.1093/gigascience/giz045

PMID:31029063

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6486474/

Abstract

BACKGROUND

An integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics has become increasing popular for understanding the pathophysiology of complex diseases. Although many multi-omics analysis methods have been developed for complex disease studies, only a few simulation tools that simulate multiple types of omics data and model their relationships with disease status are available, and these tools have their limitations in simulating the multi-omics data.

RESULTS

We developed the multi-omics data simulator OmicsSIMLA, which simulates genomics (i.e., single-nucleotide polymorphisms [SNPs] and copy number variations), epigenomics (i.e., bisulphite sequencing), transcriptomics (i.e., RNA sequencing), and proteomics (i.e., normalized reverse phase protein array) data at the whole-genome level. Furthermore, the relationships between different types of omics data, such as methylation quantitative trait loci (SNPs influencing methylation), expression quantitative trait loci (SNPs influencing gene expression), and expression quantitative trait methylations (methylations influencing gene expression), were modeled. More importantly, the relationships between these multi-omics data and the disease status were modeled as well. We used OmicsSIMLA to simulate a multi-omics dataset for breast cancer under a hypothetical disease model and used the data to compare the performance among existing multi-omics analysis methods in terms of disease classification accuracy and runtime. We also used OmicsSIMLA to simulate a multi-omics dataset with a scale similar to an ovarian cancer multi-omics dataset. The neural network-based multi-omics analysis method ATHENA was applied to both the real and simulated data and the results were compared. Our results demonstrated that complex disease mechanisms can be simulated by OmicsSIMLA, and ATHENA showed the highest prediction accuracy when the effects of multi-omics features (e.g., SNPs, copy number variations, and gene expression levels) on the disease were strong. Furthermore, similar results can be obtained from ATHENA when analyzing the simulated and real ovarian multi-omics data.

CONCLUSIONS

OmicsSIMLA will be useful to evaluate the performace of different multi-omics analysis methods. Sample sizes and power can also be calculated by OmicsSIMLA when planning a new multi-omics disease study.

摘要

背景

综合多组学分析方法越来越受到关注，该方法结合了多种组学数据，包括基因组学、表观基因组学、转录组学、蛋白质组学、代谢组学和微生物组学，用于理解复杂疾病的病理生理学。尽管已经开发了许多用于复杂疾病研究的多组学分析方法，但可用的模拟多种组学数据并模拟它们与疾病状态关系的模拟工具却很少，这些工具在模拟多组学数据方面存在局限性。

结果

我们开发了多组学数据模拟器 OmicsSIMLA，它可以模拟全基因组水平的基因组学（即单核苷酸多态性 [SNP] 和拷贝数变异）、表观基因组学（即亚硫酸氢盐测序）、转录组学（即 RNA 测序）和蛋白质组学（即归一化反相蛋白阵列）数据。此外，还模拟了不同类型的组学数据之间的关系，例如甲基化数量性状基因座（影响甲基化的 SNP）、表达数量性状基因座（影响基因表达的 SNP）和表达数量性状甲基化（影响基因表达的甲基化）。更重要的是，还模拟了这些多组学数据与疾病状态之间的关系。我们使用 OmicsSIMLA 模拟了乳腺癌的多组学数据集，并使用该数据集在疾病分类准确性和运行时方面比较了现有多组学分析方法的性能。我们还使用 OmicsSIMLA 模拟了一个规模与卵巢癌多组学数据集相似的多组学数据集。基于神经网络的多组学分析方法 ATHENA 被应用于真实和模拟数据，并对结果进行了比较。我们的结果表明，OmicsSIMLA 可以模拟复杂的疾病机制，并且当多组学特征（例如 SNP、拷贝数变异和基因表达水平）对疾病的影响较强时，ATHENA 显示出最高的预测准确性。此外，当分析模拟和真实卵巢多组学数据时，ATHENA 可以获得相似的结果。

结论

OmicsSIMLA 将有助于评估不同多组学分析方法的性能。在规划新的多组学疾病研究时，也可以使用 OmicsSIMLA 计算样本量和功效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c83f/6486474/8addebabf68b/giz045fig1.jpg

相似文献

A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification.用于复杂疾病研究的多组学数据模拟器及其在评估疾病分类的多组学数据分析方法中的应用。

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz045.

Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases.用于复杂人类疾病发现和功能研究的多组学数据综合分析

Adv Genet. 2016;93:147-90. doi: 10.1016/bs.adgen.2015.11.004. Epub 2016 Jan 25.

An integrative association method for omics data based on a modified Fisher's method with application to childhood asthma.基于改进的 Fisher 方法的组学数据综合关联方法及其在儿童哮喘中的应用。

PLoS Genet. 2019 May 7;15(5):e1008142. doi: 10.1371/journal.pgen.1008142. eCollection 2019 May.

Multi-omics study for interpretation of genome-wide association study.多组学研究用于解释全基因组关联研究。

J Hum Genet. 2021 Jan;66(1):3-10. doi: 10.1038/s10038-020-00842-5. Epub 2020 Sep 18.

Framework for the Integration of Genomics, Epigenomics and Transcriptomics in Complex Diseases.复杂疾病中基因组学、表观基因组学和转录组学整合框架

Hum Hered. 2015;79(3-4):124-36. doi: 10.1159/000381184. Epub 2015 Jul 28.

Integration of Multi-omics Data for Expression Quantitative Trait Loci (eQTL) Analysis and eQTL Epistasis.整合多组学数据用于表达数量性状位点（eQTL）分析和eQTL上位性分析。

Methods Mol Biol. 2020;2082:157-171. doi: 10.1007/978-1-0716-0026-9_11.

A Multi-Omics Perspective of Quantitative Trait Loci in Precision Medicine.精准医学中定量性状基因座的多组学视角

Trends Genet. 2020 May;36(5):318-336. doi: 10.1016/j.tig.2020.01.009. Epub 2020 Feb 24.

Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer.通过学习模态不变表示来整合多组学数据，以提高癌症总体生存预测的准确性。

Methods. 2021 May;189:74-85. doi: 10.1016/j.ymeth.2020.07.008. Epub 2020 Aug 5.

Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer.基于乳腺癌元维度组学数据间的相互作用预测删失生存数据。

J Biomed Inform. 2015 Aug;56:220-8. doi: 10.1016/j.jbi.2015.05.019. Epub 2015 Jun 3.

A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping.多组学整合工具在癌症驱动基因识别和肿瘤亚分型中的比较研究。

Brief Bioinform. 2020 Dec 1;21(6):1920-1936. doi: 10.1093/bib/bbz121.

引用本文的文献

fSuSiE enables fine-mapping of QTLs from genome-scale molecular profiles.fSuSiE能够对来自基因组规模分子图谱的数量性状基因座进行精细定位。

bioRxiv. 2025 Aug 17:2025.08.17.670732. doi: 10.1101/2025.08.17.670732.

How to use learning curves to evaluate the sample size for malaria prediction models developed using machine learning algorithms.如何使用学习曲线评估利用机器学习算法开发的疟疾预测模型的样本量。

Malar J. 2025 Jul 24;24(1):242. doi: 10.1186/s12936-025-05479-3.

MOSim: bulk and single-cell multilayer regulatory network simulator.MOSim：批量和单细胞多层调控网络模拟器。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf110.

Combined transcriptomic and proteomic analyses reveal relevant myelin features in mice with ischemic stroke.转录组学和蛋白质组学联合分析揭示了缺血性中风小鼠相关的髓鞘特征。

Funct Integr Genomics. 2025 Mar 14;25(1):64. doi: 10.1007/s10142-025-01573-6.

Integrating the milk microbiome signatures in mastitis: milk-omics and functional implications.整合乳腺炎中的乳汁微生物组特征：乳汁组学及其功能意义。

World J Microbiol Biotechnol. 2025 Jan 18;41(2):41. doi: 10.1007/s11274-024-04242-1.

Integrating Gene Expression Data into Single-Step Method (ssBLUP) Improves Genomic Prediction Accuracy for Complex Traits of Duroc × Erhualian F Pig Population.将基因表达数据整合到单步方法（ssBLUP）中可提高杜洛克×二花脸F猪群体复杂性状的基因组预测准确性。

Curr Issues Mol Biol. 2024 Dec 3;46(12):13713-13724. doi: 10.3390/cimb46120819.

asmbPLS: biomarker identification and patient survival prediction with multi-omics data.asmbPLS：利用多组学数据进行生物标志物识别和患者生存预测

Front Genet. 2024 Nov 22;15:1444054. doi: 10.3389/fgene.2024.1444054. eCollection 2024.

A Look into Ocular Diseases: The Pivotal Role of Omics Sciences in Ophthalmology Research.眼科疾病探秘：组学科学在眼科研究中的关键作用。

ACS Meas Sci Au. 2024 Feb 22;4(3):247-259. doi: 10.1021/acsmeasuresciau.3c00067. eCollection 2024 Jun 19.

Multi-omics integration identifies regulatory factors underlying bovine subclinical mastitis.多组学整合鉴定出牛亚临床乳腺炎潜在的调控因子。

J Anim Sci Biotechnol. 2024 Mar 14;15(1):46. doi: 10.1186/s40104-024-00996-8.

pycoMeth: a toolbox for differential methylation testing from Nanopore methylation calls.pycoMeth：一个从纳米孔甲基化调用中进行差异甲基化测试的工具包。

Genome Biol. 2023 Apr 20;24(1):83. doi: 10.1186/s13059-023-02917-w.

本文引用的文献

pWGBSSimla: a profile-based whole-genome bisulfite sequencing data simulator incorporating methylation QTLs, allele-specific methylations and differentially methylated regions.pWGBSSimla：一种基于特征的全基因组亚硫酸氢盐测序数据模拟器，包含甲基化 QTL、等位基因特异性甲基化和差异甲基化区域。

Bioinformatics. 2020 Feb 1;36(3):660-665. doi: 10.1093/bioinformatics/btz635.

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.NHGRI-EBI GWAS Catalog 于 2019 年发布的已发表全基因组关联研究、靶向基因芯片和汇总统计数据

Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012. doi: 10.1093/nar/gky1120.

Integrative omics for health and disease.整体医学组学与健康和疾病。

Nat Rev Genet. 2018 May;19(5):299-310. doi: 10.1038/nrg.2018.4. Epub 2018 Feb 26.

Genetic architecture: the shape of the genetic contribution to human traits and disease.遗传结构：遗传对人类特征和疾病的贡献方式。

Nat Rev Genet. 2018 Feb;19(2):110-124. doi: 10.1038/nrg.2017.101. Epub 2017 Dec 11.

A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits.用于复杂性状分类的基于图和核的组学数据整合算法比较。

BMC Bioinformatics. 2017 Dec 6;18(1):539. doi: 10.1186/s12859-017-1982-4.

Quantifying the regulatory effect size of -acting genetic variation using allelic fold change.利用等位基因折叠变化量化 - 作用遗传变异的调控效应大小。

Genome Res. 2017 Nov;27(11):1872-1884. doi: 10.1101/gr.216747.116. Epub 2017 Oct 11.

Multi-omics approaches to disease.疾病的多组学方法

Genome Biol. 2017 May 5;18(1):83. doi: 10.1186/s13059-017-1215-1.

InterSIM: Simulation tool for multiple integrative 'omic datasets'.InterSIM：用于多个综合“组学”数据集的模拟工具。

Comput Methods Programs Biomed. 2016 May;128:69-74. doi: 10.1016/j.cmpb.2016.02.011. Epub 2016 Feb 27.

Network-Based Integration of Disparate Omic Data To Identify "Silent Players" in Cancer.基于网络的多组学数据整合以识别癌症中的“沉默参与者”

PLoS Comput Biol. 2015 Dec 18;11(12):e1004595. doi: 10.1371/journal.pcbi.1004595. eCollection 2015 Dec.

Complex and multi-allelic copy number variation in human disease.人类疾病中的复杂多等位基因拷贝数变异

Brief Funct Genomics. 2015 Sep;14(5):329-38. doi: 10.1093/bfgp/elv028. Epub 2015 Jul 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于复杂疾病研究的多组学数据模拟器及其在评估疾病分类的多组学数据分析方法中的应用。

A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献