Suppr超能文献

一个多视图基因组数据模拟器。

A multi-view genomic data simulator.

作者信息

Fratello Michele, Serra Angela, Fortino Vittorio, Raiconi Giancarlo, Tagliaferri Roberto, Greco Dario

机构信息

Department of Medical, Surgical, Neurological, Metabolic and Ageing Sciences, Second University of Napoli, Napoli, Italy.

Department of Computer Science, Fisciano, Italy.

出版信息

BMC Bioinformatics. 2015 May 12;16:151. doi: 10.1186/s12859-015-0577-1.

Abstract

BACKGROUND

OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori.

RESULTS

Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions.

CONCLUSIONS

The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722.

摘要

背景

组学技术能够对来自相同样本的大量不同特征(例如,mRNA表达、miRNA表达、拷贝数变异、DNA甲基化等)进行分析。这些实验的目的通常是找到一组精简的显著特征,用于区分所检测的条件。就新型特征选择计算方法的开发而言,由于缺乏用于基准测试的完全注释的生物数据集,这项任务具有挑战性。解决这个问题的一种可能方法是生成合适的合成数据集,其组成和行为是完全可控的且先验已知的。

结果

在此,我们提出一种新颖的方法,该方法以生成不同生物分子之间的相互作用网络为中心,尤其涉及基因表达调控。合成数据集是从具有已知参数的基于常微分方程的模型中获得的。我们的结果表明,生成的数据集很好地模拟了真实数据的行为,因为流行的数据分析方法能够选择性地识别现有的相互作用。

结论

所提出的方法可与真实生物数据集结合使用,以评估数据挖掘技术。该方法的主要优势在于对模拟数据的完全控制,同时与真实生物过程保持一致性。R包MVBioDataSim可在http://neuronelab.unisa.it/?p=1722上免费提供给科学界。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1a0/4448275/e33a1d762de9/12859_2015_577_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验