• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Gnocis:一个用于在 Python 3 中交互式和可重复分析及建模顺式调控元件的集成系统。

Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3.

机构信息

Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.

Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany.

出版信息

PLoS One. 2022 Sep 9;17(9):e0274338. doi: 10.1371/journal.pone.0274338. eCollection 2022.

DOI:10.1371/journal.pone.0274338
PMID:36084008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9462789/
Abstract

Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running 'pip install gnocis'. The source code is available on GitHub, at https://github.com/bjornbredesen/gnocis.

摘要

基因表达是通过顺式调控元件(CREs)进行调节的,其中包括启动子、增强子、多梳/三价响应元件(PREs)、沉默子和绝缘子。可以使用各种统计和机器学习方法结合不同的特征空间公式来预测 CREs。虽然有用于 DNA 序列特征集和机器学习的 Python 包,但没有现有的包可以方便地将 DNA 序列特征集与机器学习方法结合起来,以进行全基因组候选 CRE 预测。我们在这里介绍 Gnocis,这是一个 Python 包,通过提供可扩展的 API 和实现组合特征集和模型以进行全基因组预测所需的“胶水”,简化了 CRE 序列的分析和建模。Gnocis 实现了各种基本特征集,包括基序对出现频率和 k-谱失配核。它与 Scikit-learn 和 TensorFlow 集成,实现了最先进的机器学习。Gnocis 还实现了一套广泛的用于处理和准备序列、区域和曲线数据的工具,这些工具对于 Python 中的一般 DNA 生物信息学可能很有用。我们还介绍了 Deep-MOCCA,这是一种受 SVM-MOCCA 启发的神经网络架构,它在没有先验基序知识的情况下实现了中等至高的泛化能力。为了演示 Gnocis 的使用,我们将多种机器学习方法应用于 D. melanogaster PREs 的建模,包括卷积神经网络(CNN),这是首次使用 CNN 对 PREs 进行建模的研究。这些模型可以很容易地适应新的 CRE 建模问题和其他生物体。为了为 Python 3 生成高性能的编译包,我们使用 Cython 实现了 Gnocis。可以通过运行 'pip install gnocis' 使用 PyPI 包管理器安装 Gnocis。源代码可在 GitHub 上获得,网址为 https://github.com/bjornbredesen/gnocis。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/6ee3e1ea19dd/pone.0274338.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/3d2e1997cba3/pone.0274338.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/2a3102e61b27/pone.0274338.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/cef1d932fd01/pone.0274338.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/99f59e77e268/pone.0274338.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/6ee3e1ea19dd/pone.0274338.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/3d2e1997cba3/pone.0274338.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/2a3102e61b27/pone.0274338.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/cef1d932fd01/pone.0274338.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/99f59e77e268/pone.0274338.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aa1/9462789/6ee3e1ea19dd/pone.0274338.g005.jpg

相似文献

1
Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3.Gnocis:一个用于在 Python 3 中交互式和可重复分析及建模顺式调控元件的集成系统。
PLoS One. 2022 Sep 9;17(9):e0274338. doi: 10.1371/journal.pone.0274338. eCollection 2022.
2
MOCCA: a flexible suite for modelling DNA sequence motif occurrence combinatorics.MOCCA:一个用于建模 DNA 序列基序出现组合的灵活套件。
BMC Bioinformatics. 2021 May 7;22(1):234. doi: 10.1186/s12859-021-04143-2.
3
DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.全基因组果蝇 Polycomb 结合位点的 DNA 序列模型提高了对独立 Polycomb 反应元件的泛化能力。
Nucleic Acids Res. 2019 Sep 5;47(15):7781-7797. doi: 10.1093/nar/gkz617.
4
MIDGET:Detecting differential gene expression on microarray data.MIDGET:检测微阵列数据中的差异基因表达。
Comput Methods Programs Biomed. 2021 Nov;211:106418. doi: 10.1016/j.cmpb.2021.106418. Epub 2021 Sep 16.
5
Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA.Splice2Deep:用于改进基因组DNA中剪接位点预测的深度卷积神经网络集成方法。
Gene. 2020 Dec;763S:100035. doi: 10.1016/j.gene.2020.100035. Epub 2020 May 13.
6
LeNup: learning nucleosome positioning from DNA sequences with improved convolutional neural networks.LeNup:利用改进的卷积神经网络从 DNA 序列学习核小体定位。
Bioinformatics. 2018 May 15;34(10):1705-1712. doi: 10.1093/bioinformatics/bty003.
7
AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU.AIKYATAN:使用 GPU 上的卷积学习进行远端调控元件的作图。
BMC Bioinformatics. 2019 Oct 7;20(1):488. doi: 10.1186/s12859-019-3049-1.
8
NeuroPycon: An open-source python toolbox for fast multi-modal and reproducible brain connectivity pipelines.NeuroPycon:一个开源的 Python 工具包,用于快速进行多模态和可重复的脑连接管道。
Neuroimage. 2020 Oct 1;219:117020. doi: 10.1016/j.neuroimage.2020.117020. Epub 2020 Jun 6.
9
Pygenprop: a Python library for programmatic exploration and comparison of organism genome properties.Pygenprop:一个用于程序化探索和比较生物基因组属性的 Python 库。
Bioinformatics. 2019 Dec 1;35(23):5063-5065. doi: 10.1093/bioinformatics/btz522.
10
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.通过对大量染色质免疫沉淀数据集进行综合分析,从头预测顺式调控元件和模块。
BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047.

引用本文的文献

1
Choice of refractive surgery types for myopia assisted by machine learning based on doctors' surgical selection data.基于医生手术选择数据的机器学习辅助近视屈光手术类型选择。
BMC Med Inform Decis Mak. 2024 Feb 8;24(1):41. doi: 10.1186/s12911-024-02451-0.

本文引用的文献

1
MOCCA: a flexible suite for modelling DNA sequence motif occurrence combinatorics.MOCCA:一个用于建模 DNA 序列基序出现组合的灵活套件。
BMC Bioinformatics. 2021 May 7;22(1):234. doi: 10.1186/s12859-021-04143-2.
2
FastSK: fast sequence analysis with gapped string kernels.FastSK:使用带间隙字符串核的快速序列分析。
Bioinformatics. 2020 Dec 30;36(Suppl_2):i857-i865. doi: 10.1093/bioinformatics/btaa817.
3
Author Correction: SciPy 1.0: fundamental algorithms for scientific computing in Python.作者更正:SciPy 1.0:Python中科学计算的基础算法。
Nat Methods. 2020 Mar;17(3):352. doi: 10.1038/s41592-020-0772-5.
4
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
5
DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.全基因组果蝇 Polycomb 结合位点的 DNA 序列模型提高了对独立 Polycomb 反应元件的泛化能力。
Nucleic Acids Res. 2019 Sep 5;47(15):7781-7797. doi: 10.1093/nar/gkz617.
6
PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences.PyFeat:一个基于 Python 的用于 DNA、RNA 和蛋白质序列的有效特征生成工具。
Bioinformatics. 2019 Oct 1;35(19):3831-3833. doi: 10.1093/bioinformatics/btz165.
7
Ensembl 2018.Ensembl 2018.
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.
8
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.在不平衡数据集上评估二元分类器时,精确率-召回率曲线比ROC曲线更具信息性。
PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015.
9
repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.repDNA:一个 Python 包,通过结合用户定义的物理化学性质和序列顺序效应,为 DNA 序列生成各种模式的特征向量。
Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.
10
Supervised learning method for predicting chromatin boundary associated insulator elements.用于预测染色质边界相关绝缘子元件的监督学习方法。
J Bioinform Comput Biol. 2014 Dec;12(6):1442006. doi: 10.1142/S0219720014420062.