Pan Wei, Wei Peng, Khodursky Arkady
Division of Biostatistics, School of Public Health, University of Minnesota, USA.
Pac Symp Biocomput. 2008:465-76.
This paper concerns with predicting the regulatory targets of a transcription factor (TF). We propose and study a joint model that combines the use of DNA-protein binding, gene expression and DNA sequence data simultaneously; a parametric mixture model is used to realize unsupervised learning, which however can be extended to semi-supervised learning too. We applied the methods to an E coli dataset to identify the target genes of LexA, which, along with applications to simulated data, demonstrated potential gains of jointly modeling multiple types of data over using only one type of data.
本文关注转录因子(TF)调控靶点的预测。我们提出并研究了一种联合模型,该模型同时结合使用DNA-蛋白质结合、基因表达和DNA序列数据;使用参数混合模型来实现无监督学习,不过它也可以扩展到半监督学习。我们将这些方法应用于一个大肠杆菌数据集,以识别LexA的靶基因,并且将其应用于模拟数据,结果表明,相较于仅使用一种类型的数据,联合建模多种类型的数据具有潜在优势。