一个多视图基因组数据模拟器。

A multi-view genomic data simulator.

作者信息

Fratello Michele, Serra Angela, Fortino Vittorio, Raiconi Giancarlo, Tagliaferri Roberto, Greco Dario

机构信息

Department of Medical, Surgical, Neurological, Metabolic and Ageing Sciences, Second University of Napoli, Napoli, Italy.

Department of Computer Science, Fisciano, Italy.

出版信息

BMC Bioinformatics. 2015 May 12;16:151. doi: 10.1186/s12859-015-0577-1.

DOI:10.1186/s12859-015-0577-1

PMID:25962835

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4448275/

Abstract

BACKGROUND

OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori.

RESULTS

Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions.

CONCLUSIONS

The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722.

摘要

背景

组学技术能够对来自相同样本的大量不同特征（例如，mRNA表达、miRNA表达、拷贝数变异、DNA甲基化等）进行分析。这些实验的目的通常是找到一组精简的显著特征，用于区分所检测的条件。就新型特征选择计算方法的开发而言，由于缺乏用于基准测试的完全注释的生物数据集，这项任务具有挑战性。解决这个问题的一种可能方法是生成合适的合成数据集，其组成和行为是完全可控的且先验已知的。

结果

在此，我们提出一种新颖的方法，该方法以生成不同生物分子之间的相互作用网络为中心，尤其涉及基因表达调控。合成数据集是从具有已知参数的基于常微分方程的模型中获得的。我们的结果表明，生成的数据集很好地模拟了真实数据的行为，因为流行的数据分析方法能够选择性地识别现有的相互作用。

结论

所提出的方法可与真实生物数据集结合使用，以评估数据挖掘技术。该方法的主要优势在于对模拟数据的完全控制，同时与真实生物过程保持一致性。R包MVBioDataSim可在http://neuronelab.unisa.it/?p=1722上免费提供给科学界。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1a0/4448275/e33a1d762de9/12859_2015_577_Fig1_HTML.jpg

相似文献

A multi-view genomic data simulator.

BMC Bioinformatics. 2015 May 12;16:151. doi: 10.1186/s12859-015-0577-1.

Identifying multi-layer gene regulatory modules from multi-dimensional genomic data.

Bioinformatics. 2012 Oct 1;28(19):2458-66. doi: 10.1093/bioinformatics/bts476. Epub 2012 Aug 3.

A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S1. doi: 10.1186/1752-0509-8-S3-S1. Epub 2014 Oct 22.

MVisAGe Identifies Concordant and Discordant Genomic Alterations of Driver Genes in Squamous Tumors.

Cancer Res. 2018 Jun 15;78(12):3375-3385. doi: 10.1158/0008-5472.CAN-17-3464. Epub 2018 Apr 26.

COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms.

PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.

DINGO: differential network analysis in genomics.

Bioinformatics. 2015 Nov 1;31(21):3413-20. doi: 10.1093/bioinformatics/btv406. Epub 2015 Jul 6.

Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data.

BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.

Pathway-based visualization of cross-platform microarray datasets.

Bioinformatics. 2012 Dec 1;28(23):3021-6. doi: 10.1093/bioinformatics/bts583. Epub 2012 Oct 9.

MVDA: a multi-view genomic data integration methodology.

BMC Bioinformatics. 2015 Aug 19;16:261. doi: 10.1186/s12859-015-0680-3.

Identifying differential networks based on multi-platform gene expression data.

Mol Biosyst. 2016 Dec 20;13(1):183-192. doi: 10.1039/c6mb00619a.

引用本文的文献

Learning-based Cancer Treatment Outcome Prognosis using Multimodal Biomarkers.

IEEE Trans Radiat Plasma Med Sci. 2022 Feb;6(2):231-244. doi: 10.1109/trpms.2021.3104297. Epub 2021 Aug 12.

Nextcast: A software suite to analyse and model toxicogenomics data.

Comput Struct Biotechnol J. 2022 Mar 18;20:1413-1426. doi: 10.1016/j.csbj.2022.03.014. eCollection 2022.

The Clinician's Guide to the Machine Learning Galaxy.

Front Physiol. 2021 Apr 6;12:658583. doi: 10.3389/fphys.2021.658583. eCollection 2021.

Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment.

Nanomaterials (Basel). 2020 Apr 8;10(4):708. doi: 10.3390/nano10040708.

Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

Bioinform Biol Insights. 2018 Feb 20;12:1177932218759292. doi: 10.1177/1177932218759292. eCollection 2018.

The parameter sensitivity of random forests.

BMC Bioinformatics. 2016 Sep 1;17(1):331. doi: 10.1186/s12859-016-1228-x.

本文引用的文献

Passing messages between biological networks to refine predicted interactions.

PLoS One. 2013 May 31;8(5):e64832. doi: 10.1371/journal.pone.0064832. Print 2013.

An integrated encyclopedia of DNA elements in the human genome.

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Uncovering MicroRNA and Transcription Factor Mediated Regulatory Networks in Glioblastoma.

PLoS Comput Biol. 2012;8(7):e1002488. doi: 10.1371/journal.pcbi.1002488. Epub 2012 Jul 19.

A mathematical model for the validation of gene selection methods.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1385-92. doi: 10.1109/TCBB.2010.83.

GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

Bioinformatics. 2011 Aug 15;27(16):2263-70. doi: 10.1093/bioinformatics/btr373. Epub 2011 Jun 22.

Numerical modelling of microRNA-mediated mRNA decay identifies novel mechanism of microRNA controlled mRNA downregulation.

Nucleic Acids Res. 2010 Aug;38(14):4579-85. doi: 10.1093/nar/gkq220. Epub 2010 Apr 5.

A gene network simulator to assess reverse engineering algorithms.

Ann N Y Acad Sci. 2009 Mar;1158:125-42. doi: 10.1111/j.1749-6632.2008.03756.x.

Gene regulatory network inference: data integration in dynamic models-a review.

Biosystems. 2009 Apr;96(1):86-103. doi: 10.1016/j.biosystems.2008.12.004. Epub 2008 Dec 27.

Modelling and analysis of gene regulatory networks.

Nat Rev Mol Cell Biol. 2008 Oct;9(10):770-80. doi: 10.1038/nrm2503. Epub 2008 Sep 17.

A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity.

Genes Dev. 2008 Sep 15;22(18):2535-49. doi: 10.1101/gad.1678608.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一个多视图基因组数据模拟器。

A multi-view genomic data simulator.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献