MicroDiscovery GmbH, Marienburger Str, 1, 10405 Berlin, Germany.
BMC Bioinformatics. 2011 May 9;12:140. doi: 10.1186/1471-2105-12-140.
Diabetes like many diseases and biological processes is not mono-causal. On the one hand multi-factorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics.
We present a comprehensive work-flow tailored for analyzing complex data including data from multi-factorial studies. The developed approach aims at revealing effects caused by a distinct combination of experimental factors, in our case genotype and diet. Applying the developed work-flow to the analysis of an established polygenic mouse model for diet-induced type 2 diabetes, we found peptides with significant fold changes exclusively for the combination of a particular strain and diet. Exploitation of redundancy enables the visualization of peptide correlation and provides a natural way of feature selection for classification and prediction. Classification based on the features selected using our approach performs similar to classifications based on more complex feature selection methods.
The combination of ANOVA and redundancy exploitation allows for identification of biomarker candidates in multi-dimensional MALDI-TOF MS profiling studies with complex experimental design. With respect to feature selection our method provides a fast and intuitive alternative to global optimization strategies with comparable performance. The method is implemented in R and the scripts are available by contacting the corresponding author.
糖尿病与许多疾病和生物过程一样,不是单一原因引起的。一方面,需要进行多因素研究,并采用复杂的实验设计来进行全面分析。另一方面,这些研究的数据通常包含大量的冗余信息,例如通常由多种肽代表的蛋白质。同时应对这两个复杂性(实验和技术)使得生物信息学的数据分析成为一个挑战。
我们提出了一个全面的工作流程,专门用于分析复杂数据,包括多因素研究的数据。所开发的方法旨在揭示由实验因素的独特组合引起的影响,在我们的情况下是基因型和饮食。将所开发的工作流程应用于对已建立的多基因小鼠模型进行饮食诱导的 2 型糖尿病的分析,我们发现了仅针对特定菌株和饮食组合的具有显著倍数变化的肽。利用冗余性可以可视化肽相关性,并为分类和预测提供自然的特征选择方式。基于我们方法选择的特征进行的分类与基于更复杂特征选择方法的分类性能相当。
ANOVA 与冗余性利用的结合可用于鉴定具有复杂实验设计的多维 MALDI-TOF MS 分析研究中的生物标志物候选物。在特征选择方面,我们的方法提供了一种快速直观的替代方案,与全局优化策略具有相当的性能。该方法已在 R 中实现,脚本可通过联系相应作者获得。