Key Laboratory of Computational Biology, MPG-CAS PICB, Shanghai, PR China.
BMC Bioinformatics. 2011 Mar 30;12:86. doi: 10.1186/1471-2105-12-86.
Advances in biotechnology offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. However, to date, most computational and algorithmic efforts have been directed at mining data from each of these molecular levels (genomic, transcriptional, etc.) separately. In view of the rapid advances in technology (new generation sequencing, high-throughput proteomics) it is important to address the problem of analyzing these data as a whole, i.e. preserving the emergent properties that appear in the cellular system when all molecular levels are interacting. We analyzed one of the (currently) few datasets that provide both transcriptional and post-transcriptional data of the same samples to investigate the possibility to extract more information, using a joint analysis approach.
We use Factor Analysis coupled with pre-established knowledge as a theoretical base to achieve this goal. Our intention is to identify structures that contain information from both mRNAs and miRNAs, and that can explain the complexity of the data. Despite the small sample available, we can show that this approach permits identification of meaningful structures, in particular two polycistronic miRNA genes related to transcriptional activity and likely to be relevant in the discrimination between gliosarcomas and other brain tumors.
This suggests the need to develop methodologies to simultaneously mine information from different levels of biological organization, rather than linking separate analyses performed in parallel.
生物技术的进步为筛选基因组、转录组、转录后和翻译观察分子活动提供了越来越多的高通量数据。然而,迄今为止,大多数计算和算法工作都集中在分别挖掘这些分子水平(基因组、转录组等)的数据上。鉴于技术的快速进步(新一代测序、高通量蛋白质组学),重要的是要解决将这些数据作为一个整体进行分析的问题,即保留当所有分子水平相互作用时出现在细胞系统中的新兴特性。我们分析了目前为数不多的提供相同样本转录组和转录后数据的数据集之一,以研究使用联合分析方法提取更多信息的可能性。
我们使用因子分析并结合预先建立的知识作为理论基础来实现这一目标。我们的目的是识别包含来自 mRNA 和 miRNA 的信息的结构,并且可以解释数据的复杂性。尽管可用的样本很小,但我们可以证明这种方法允许识别有意义的结构,特别是两个与转录活性相关的多顺反子 miRNA 基因,可能与胶质肉瘤和其他脑肿瘤的区分有关。
这表明需要开发同时从不同层次的生物组织中挖掘信息的方法,而不是链接并行执行的单独分析。