Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada.
PLoS One. 2013 Aug 6;8(8):e71971. doi: 10.1371/journal.pone.0071971. Print 2013.
When endogenous retroviruses (ERVs) or other transposable elements (TEs) insert into an intron, the consequence on gene transcription can range from negligible to a complete ablation of normal transcripts. With the advance of sequencing technology, more and more insertionally polymorphic or private TE insertions are being identified in humans and mice, of which some could have a significant impact on host gene expression. Nevertheless, an efficient and low cost approach to prioritize their potential effect on gene transcription has been lacking. By building a computational model based on artificial neural networks (ANN), we demonstrate the feasibility of using machine-learning approaches to predict the likelihood that intronic ERV insertions will have major effects on gene transcription, focusing on the two ERV families, namely Intracisternal A-type Particle (IAP) and Early Transposon (ETn)/MusD elements, which are responsible for the majority of ERV-induced mutations in mice. We trained the ANN model using properties associated with these ERVs known to cause germ-line mutations (positive cases) and properties associated with likely neutral ERVs of the same families (negative cases), and derived a set of prediction plots that can visualize the likelihood of affecting gene transcription by ERV insertions. Our results show a highly reliable prediction power of our model, and offer a potential approach to computationally screen for other types of TE insertions that may affect gene transcription or even cause disease.
当内源性逆转录病毒(ERVs)或其他转座元件(TEs)插入内含子时,对基因转录的影响范围从可以忽略不计到完全破坏正常转录本。随着测序技术的进步,越来越多的插入多态性或个体 TE 插入在人类和小鼠中被发现,其中一些可能对宿主基因表达有重大影响。然而,缺乏一种有效且低成本的方法来优先考虑它们对基因转录的潜在影响。通过构建基于人工神经网络(ANN)的计算模型,我们证明了使用机器学习方法来预测内含子 ERV 插入对基因转录产生重大影响的可能性是可行的,重点关注两种 ERV 家族,即内囊 A 型粒子(IAP)和早期转座子(ETn)/MusD 元件,它们是导致小鼠中大多数 ERV 诱导突变的主要原因。我们使用已知导致种系突变的这些 ERV 的特性(阳性病例)和可能具有相同家族的中性 ERV 的特性(阴性病例)来训练 ANN 模型,并得出了一组预测图,可以直观地显示 ERV 插入影响基因转录的可能性。我们的结果表明,我们的模型具有高度可靠的预测能力,并为计算筛选可能影响基因转录甚至导致疾病的其他类型的 TE 插入提供了一种潜在方法。