Department of Computer Science, University of Chicago, Chicago, 60637, USA.
Computing Environment and Life Sciences Division, Argonne National Laboratory, Lemont, 60439, USA.
Sci Rep. 2021 Jan 22;11(1):2124. doi: 10.1038/s41598-021-81169-9.
Contiguous genes in prokaryotes are often arranged into operons. Detecting operons plays a critical role in inferring gene functionality and regulatory networks. Human experts annotate operons by visually inspecting gene neighborhoods across pileups of related genomes. These visual representations capture the inter-genic distance, strand direction, gene size, functional relatedness, and gene neighborhood conservation, which are the most prominent operon features mentioned in the literature. By studying these features, an expert can then decide whether a genomic region is part of an operon. We propose a deep learning based method named Operon Hunter that uses visual representations of genomic fragments to make operon predictions. Using transfer learning and data augmentation techniques facilitates leveraging the powerful neural networks trained on image datasets by re-training them on a more limited dataset of extensively validated operons. Our method outperforms the previously reported state-of-the-art tools, especially when it comes to predicting full operons and their boundaries accurately. Furthermore, our approach makes it possible to visually identify the features influencing the network's decisions to be subsequently cross-checked by human experts.
原核生物中的连续基因通常排列成操纵子。检测操纵子在推断基因功能和调控网络方面起着至关重要的作用。人类专家通过视觉检查相关基因组堆积物中的基因邻域来注释操纵子。这些视觉表示形式捕获了基因间距离、链方向、基因大小、功能相关性和基因邻域保守性,这些都是文献中提到的最突出的操纵子特征。通过研究这些特征,专家可以决定基因组区域是否是操纵子的一部分。我们提出了一种基于深度学习的方法,名为 Operon Hunter,它使用基因组片段的视觉表示来进行操纵子预测。使用迁移学习和数据增强技术,可以通过在经过广泛验证的操纵子的更有限的数据集上重新训练来利用在图像数据集上训练的强大神经网络。我们的方法优于之前报道的最先进的工具,尤其是在准确预测完整的操纵子及其边界方面。此外,我们的方法还可以直观地识别影响网络决策的特征,以便随后由人类专家进行交叉检查。