Oja Merja, Peltonen Jaakko, Blomberg Jonas, Kaski Samuel
Department of Computer Science, University of Helsinki, University of Helsinki, Finland.
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S11. doi: 10.1186/1471-2105-8-S2-S11.
Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown.
We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown.
(i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.
人类内源性逆转录病毒(HERVs)是古代逆转录病毒感染的留存痕迹,现存在于人类DNA中。最近,在正常组织和患病患者中均检测到了HERV的表达。然而,单个HERV序列的活性(表达水平)大多未知。
我们引入了一种基于隐马尔可夫模型的生成混合模型,用于从EST(表达序列标签)数据库估计单个HERV序列的活性。我们使用该模型估计了181种HERV的相对活性。我们还通过实证证明了一种用于HERV活性估计的更快启发式方法,并使用它来估计2450种HERV的活性。大多数HERV的活性此前未知。
(i)我们的方法基于对模拟数据的实验准确估计活性。(ii)我们对真实数据的估计表明,7%的HERV是活跃的。活跃的HERV在HERV组中分布不均,但就估计的年龄而言相对均匀。具有逆转录病毒env基因的HERV比没有env的HERV更常活跃。活跃的HERV中很少有逆转录病毒蛋白的开放阅读框。