• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MIDGET:检测微阵列数据中的差异基因表达。

MIDGET:Detecting differential gene expression on microarray data.

机构信息

Department of Automatic Control and Industrial Informatics, Faculty of Automatic Control and Computer Science, University "Politehnica" of Bucharest, Splaiul Independentei nr. 313, Sector 6, Bucuresti, 060042, Romania.

出版信息

Comput Methods Programs Biomed. 2021 Nov;211:106418. doi: 10.1016/j.cmpb.2021.106418. Epub 2021 Sep 16.

DOI:10.1016/j.cmpb.2021.106418
PMID:34555591
Abstract

Backgound and Objective: Detecting differentially expressed genes is an important step in genome wide analysis and expression profiling. There are a wide array of algorithms used in today's research based on statistical approaches. Even though the current algorithms work, they sometimes miss-predict. There is no framework available for measuring the quality of current algorithms. New machine learning methods (like gradient boost and deep neural networks) were not used to solve this problem. The Gene-Bench open source python package addresses these issues by providing an evaluation and data handling system for differentially expressed genes detection algorithms on microarray data. We also provide MIDGET, a new group of algorithms based on state of the art machine learning approaches Methods: The Gene-Bench package provides data collected from real experiments that consists of 73 transcription-factor perturbation experiments with validation data from Chip-seq experiments and 129 drug perturbation experiments, synthetic data generated with our own method and three evaluation metrics (Kolmogorov, F1 and AUC/ROC). Besides the data and metrics, Gene-Bench also contains well-known algorithms and a new method to identify differentially expressed genes, called MIDGET: Machine learning Identification Differential Gene Expression Tool that is using big-data and machine learning methods to identify differentially expressed genes. The two new groups of machine learning algorithms provided in our package use extreme gradient boosting and deep neural networks to achieve their results. Results: The Gene-Bench package is highly flexible, allows fast prototyping and evaluating of new and old algorithms and provides multiple new machine-learning algorithms (called MIDGET) that perform better on all evaluation metrics than all the other tested alternatives. While everything provided in Gene-Bench is algorithm independent, the user can also use algorithms implemented in the R language even though the package is written in Python. Conclusions: The Gene-Bench package fills a gap in evaluating and benchmarking differential gene detection algorithms. It also provides machine learning methods that perform detection with higher accuracy in all tested metrics. It is available at https://github.com/raduangelescu/GeneBench/ and can be directly installed from the Python Package Index using pip install genebench.

摘要

背景与目标

在全基因组分析和表达谱中,检测差异表达基因是一个重要步骤。目前的研究中使用了广泛的基于统计方法的算法。尽管目前的算法可以工作,但有时会出现错误预测。目前还没有用于衡量当前算法质量的框架。新的机器学习方法(如梯度提升和深度神经网络)尚未用于解决此问题。Gene-Bench 是一个开源的 Python 包,通过提供用于微阵列数据中差异表达基因检测算法的评估和数据处理系统来解决这些问题。我们还提供了 MIDGET,这是一组基于最新机器学习方法的新算法。

方法

Gene-Bench 包提供了从真实实验中收集的数据,其中包括 73 个转录因子扰动实验,以及来自 Chip-seq 实验的验证数据和 129 个药物扰动实验、使用我们自己的方法生成的合成数据以及三个评估指标(Kolmogorov、F1 和 AUC/ROC)。除了数据和指标外,Gene-Bench 还包含了众所周知的算法和一种新的识别差异表达基因的方法,称为 MIDGET:使用大数据和机器学习方法识别差异表达基因的机器学习识别差异基因表达工具。我们包中提供的两组新的机器学习算法使用极端梯度提升和深度神经网络来获得结果。

结果

Gene-Bench 包具有高度的灵活性,允许快速原型设计和评估新的和旧的算法,并提供多个新的机器学习算法(称为 MIDGET),这些算法在所有评估指标上的表现都优于所有其他测试的替代算法。虽然 Gene-Bench 中提供的所有内容都是算法独立的,但用户也可以使用 R 语言实现的算法,即使该包是用 Python 编写的。

结论

Gene-Bench 包填补了评估和基准测试差异基因检测算法的空白。它还提供了机器学习方法,在所有测试的指标中都能以更高的准确性进行检测。它可在 https://github.com/raduangelescu/GeneBench/ 上获得,也可以使用 pip install genebench 直接从 Python 包索引中安装。

相似文献

1
MIDGET:Detecting differential gene expression on microarray data.MIDGET:检测微阵列数据中的差异基因表达。
Comput Methods Programs Biomed. 2021 Nov;211:106418. doi: 10.1016/j.cmpb.2021.106418. Epub 2021 Sep 16.
2
pymia: A Python package for data handling and evaluation in deep learning-based medical image analysis.pymia:一个用于深度学习医学图像分析中数据处理和评估的 Python 包。
Comput Methods Programs Biomed. 2021 Jan;198:105796. doi: 10.1016/j.cmpb.2020.105796. Epub 2020 Oct 19.
3
PyRaDiSe: A Python package for DICOM-RT-based auto-segmentation pipeline construction and DICOM-RT data conversion.PyRaDiSe:一个用于基于DICOM-RT的自动分割管道构建和DICOM-RT数据转换的Python软件包。
Comput Methods Programs Biomed. 2023 Apr;231:107374. doi: 10.1016/j.cmpb.2023.107374. Epub 2023 Jan 28.
4
Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3.Gnocis:一个用于在 Python 3 中交互式和可重复分析及建模顺式调控元件的集成系统。
PLoS One. 2022 Sep 9;17(9):e0274338. doi: 10.1371/journal.pone.0274338. eCollection 2022.
5
Machine Learning for Neural Decoding.机器学习在神经解码中的应用。
eNeuro. 2020 Aug 31;7(4). doi: 10.1523/ENEURO.0506-19.2020. Print 2020 Jul/Aug.
6
PHOTONAI-A Python API for rapid machine learning model development.PHOTONAI-用于快速机器学习模型开发的 Python API。
PLoS One. 2021 Jul 21;16(7):e0254062. doi: 10.1371/journal.pone.0254062. eCollection 2021.
7
Deep learning-based framework for slide-based histopathological image analysis.基于深度学习的幻灯片组织病理学图像分析框架。
Sci Rep. 2022 Nov 9;12(1):19075. doi: 10.1038/s41598-022-23166-0.
8
R.ROSETTA: an interpretable machine learning framework.R.ROSETTA:一个可解释的机器学习框架。
BMC Bioinformatics. 2021 Mar 6;22(1):110. doi: 10.1186/s12859-021-04049-z.
9
PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning.PanClassif:使用机器学习改进单细胞RNA测序基因表达数据的泛癌分类
Genomics. 2022 Mar;114(2):110264. doi: 10.1016/j.ygeno.2022.01.001. Epub 2022 Jan 6.
10
NeuroPycon: An open-source python toolbox for fast multi-modal and reproducible brain connectivity pipelines.NeuroPycon:一个开源的 Python 工具包,用于快速进行多模态和可重复的脑连接管道。
Neuroimage. 2020 Oct 1;219:117020. doi: 10.1016/j.neuroimage.2020.117020. Epub 2020 Jun 6.

引用本文的文献

1
STLBRF: an improved random forest algorithm based on standardized-threshold for feature screening of gene expression data.STLBRF:一种基于标准化阈值的改进随机森林算法,用于基因表达数据的特征筛选。
Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elae048.
2
GeniePool: genomic database with corresponding annotated samples based on a cloud data lake architecture.GeniePool:基于云数据湖架构的基因组数据库,包含相应的注释样本。
Database (Oxford). 2023 Jun 13;2023. doi: 10.1093/database/baad043.
3
Integrated analysis of necroptosis-related genes for evaluating immune infiltration and colon cancer prognosis.
基于坏死性凋亡相关基因的综合分析评估免疫浸润和结肠癌预后。
Front Immunol. 2022 Dec 22;13:1085038. doi: 10.3389/fimmu.2022.1085038. eCollection 2022.