Suppr超能文献

基于分段常数函数的贝叶斯估计对短基因表达时间序列进行分类。

Classifying short gene expression time-courses with Bayesian estimation of piecewise constant functions.

机构信息

Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.

出版信息

Bioinformatics. 2011 Apr 1;27(7):946-52. doi: 10.1093/bioinformatics/btr037. Epub 2011 Jan 25.

Abstract

MOTIVATION

Analyzing short time-courses is a frequent and relevant problem in molecular biology, as, for example, 90% of gene expression time-course experiments span at most nine time-points. The biological or clinical questions addressed are elucidating gene regulation by identification of co-expressed genes, predicting response to treatment in clinical, trial-like settings or classifying novel toxic compounds based on similarity of gene expression time-courses to those of known toxic compounds. The latter problem is characterized by irregular and infrequent sample times and a total lack of prior assumptions about the incoming query, which comes in stark contrast to clinical settings and requires to implicitly perform a local, gapped alignment of time series. The current state-of-the-art method (SCOW) uses a variant of dynamic time warping and models time series as higher order polynomials (splines).

RESULTS

We suggest to model time-courses monitoring response to toxins by piecewise constant functions, which are modeled as left-right Hidden Markov Models. A Bayesian approach to parameter estimation and inference helps to cope with the short, but highly multivariate time-courses. We improve prediction accuracy by 7% and 4%, respectively, when classifying toxicology and stress response data. We also reduce running times by at least a factor of 140; note that reasonable running times are crucial when classifying response to toxins. In conclusion, we have demonstrated that appropriate reduction of model complexity can result in substantial improvements both in classification performance and running time.

AVAILABILITY

A Python package implementing the methods described is freely available under the GPL from http://bioinformatics.rutgers.edu/Software/MVQueries/.

摘要

动机

在分子生物学中,分析短期时间序列是一个常见且重要的问题,例如,90%的基因表达时间序列实验最多跨越九个时间点。所解决的生物学或临床问题包括通过鉴定共表达基因来阐明基因调控、预测临床试验中的治疗反应或基于基因表达时间序列与已知毒性化合物的相似性对新型毒性化合物进行分类。后一个问题的特点是采样时间不规则且不频繁,并且完全没有关于传入查询的先验假设,这与临床环境形成鲜明对比,需要隐式执行时间序列的局部、有间隙的对齐。当前的最先进方法(SCOW)使用动态时间扭曲的变体,并将时间序列建模为高阶多项式(样条)。

结果

我们建议通过分段常数函数来建模监测毒素反应的时间序列,这些函数建模为左右隐马尔可夫模型。参数估计和推断的贝叶斯方法有助于处理短但高度多元的时间序列。当对毒理学和应激反应数据进行分类时,我们分别将预测精度提高了 7%和 4%。我们还将运行时间至少减少了 140 倍;请注意,当对毒素的反应进行分类时,合理的运行时间至关重要。总之,我们已经证明,适当减少模型复杂度可以在分类性能和运行时间方面都取得显著的改进。

可用性

描述的方法的 Python 包可根据 GPL 从 http://bioinformatics.rutgers.edu/Software/MVQueries/ 免费获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验