PONYTA：基于生物网络的 PU 学习从小鼠 KO 事件中优先选择表型相关基因。

PONYTA: prioritization of phenotype-related genes from mouse KO events using PU learning on a biological network.

机构信息

Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Republic of Korea.

Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae634.

DOI:10.1093/bioinformatics/btae634

PMID:39432684

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11561041/

Abstract

MOTIVATION

Transcriptome data from gene knock-out (KO) experiments in mice provide crucial insights into the intricate interactions between genotype and phenotype. Differentially expressed gene (DEG) analysis and network propagation (NP) are well-established methods for analysing transcriptome data. To determine genes related to phenotype changes from a KO experiment, we need to choose a cutoff value for the corresponding criterion based on the specific method. Using a rigorous cutoff value for DEG analysis and NP is likely to select mostly positive genes related to the phenotype, but many will be rejected as false negatives. On the other hand, using a loose cutoff value for either method is prone to include a number of genes that are not phenotype-related, which are false positives. Thus, the research problem at hand is how to deal with the trade-off between false negatives and false positives.

RESULTS

We propose a novel framework called PONYTA for gene prioritization via positive-unlabeled (PU) learning on biological networks. Beginning with the selection of true phenotype-related genes using a rigorous cutoff value for DEG analysis and NP, we address the issue of handling false negatives by rescuing them through PU learning. Evaluations on transcriptome data from multiple studies show that our approach has superior gene prioritization ability compared to benchmark models. Therefore, PONYTA effectively prioritizes genes related to phenotypes derived from gene KO events and guides in vitro and in vivo gene KO experiments for increased efficiency.

AVAILABILITY AND IMPLEMENTATION

The source code of PONYTA is available at https://github.com/Jun-Hyeong-Kim/PONYTA.

摘要

动机

来自基因敲除 (KO) 实验的转录组数据为基因型和表型之间的复杂相互作用提供了重要的见解。差异表达基因 (DEG) 分析和网络传播 (NP) 是分析转录组数据的成熟方法。为了从 KO 实验中确定与表型变化相关的基因，我们需要根据特定的方法选择相应标准的截止值。使用严格的 DEG 分析和 NP 截止值可能会选择与表型相关的大多数阳性基因，但许多基因会被拒绝为假阴性。另一方面，使用宽松的截止值对于任何一种方法都容易包含许多与表型无关的基因，这些基因是假阳性。因此，当前的研究问题是如何在假阴性和假阳性之间进行权衡。

结果

我们提出了一种称为 PONYTA 的新框架，用于通过生物网络上的正无标签 (PU) 学习进行基因优先级排序。从使用 DEG 分析和 NP 的严格截止值选择真正与表型相关的基因开始，我们通过 PU 学习来解决处理假阴性的问题。对来自多个研究的转录组数据的评估表明，与基准模型相比，我们的方法具有卓越的基因优先级排序能力。因此，PONTA 有效地优先考虑了源自基因 KO 事件的表型相关基因，并指导体外和体内基因 KO 实验以提高效率。

可用性和实现

PONYTA 的源代码可在 https://github.com/Jun-Hyeong-Kim/PONYTA 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/586c/11561041/7aaf34a017b5/btae634f1.jpg

相似文献

PONYTA: prioritization of phenotype-related genes from mouse KO events using PU learning on a biological network.PONYTA：基于生物网络的 PU 学习从小鼠 KO 事件中优先选择表型相关基因。

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae634.

CLIP-GENE: a web service of the condition specific context-laid integrative analysis for gene prioritization in mouse TF knockout experiments.CLIP-GENE：一种用于小鼠转录因子敲除实验中基因优先级排序的特定条件上下文整合分析的网络服务。

Biol Direct. 2016 Oct 24;11(1):57. doi: 10.1186/s13062-016-0158-x.

MLDEG: A Machine Learning Approach to Identify Differentially Expressed Genes Using Network Property and Network Propagation.MLDEG：一种基于机器学习的方法，利用网络特性和网络传播来识别差异表达基因。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2356-2364. doi: 10.1109/TCBB.2021.3067613. Epub 2022 Aug 8.

Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments.Venn-diaNet：基于韦恩图的网络传播分析框架，用于比较多个生物学实验。

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):667. doi: 10.1186/s12859-019-3302-7.

Prioritizing disease genes with an improved dual label propagation framework.利用改进的双重标签传播框架优先考虑疾病基因。

BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.

Ensemble positive unlabeled learning for disease gene identification.用于疾病基因识别的集成正无标记学习

PLoS One. 2014 May 9;9(5):e97079. doi: 10.1371/journal.pone.0097079. eCollection 2014.

Identification of Skt11-regulated genes in chondrocytes by integrated bioinformatics analysis.通过综合生物信息学分析鉴定软骨细胞中的 Skt11 调控基因。

Gene. 2018 Nov 30;677:340-348. doi: 10.1016/j.gene.2018.08.013. Epub 2018 Aug 11.

GuiltyTargets: Prioritization of Novel Therapeutic Targets With Network Representation Learning.有罪靶点：基于网络表示学习的新型治疗靶点的优先级排序。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):491-500. doi: 10.1109/TCBB.2020.3003830. Epub 2022 Feb 3.

Disease genes prediction by HMM based PU-learning using gene expression profiles.基于基因表达谱的 HMM 基于 PU 学习的疾病基因预测。

J Biomed Inform. 2018 May;81:102-111. doi: 10.1016/j.jbi.2018.03.006. Epub 2018 Mar 20.

Integrated analyses to reconstruct microRNA-mediated regulatory networks in mouse liver using high-throughput profiling.利用高通量分析重建小鼠肝脏中微小RNA介导的调控网络的综合分析。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S12. doi: 10.1186/1471-2164-16-S2-S12. Epub 2015 Jan 21.

本文引用的文献

Loss of NR5A1 in mouse Sertoli cells after sex determination changes cellular identity and induces cell death by anoikis.性别决定后，NR5A1 在小鼠支持细胞中的丢失改变了细胞的身份，并通过凋亡诱导细胞死亡。

Development. 2023 Dec 15;150(24). doi: 10.1242/dev.201710. Epub 2023 Dec 11.

Positive-Unlabeled Learning With Label Distribution Alignment.基于标签分布对齐的正例-无标签学习

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15345-15363. doi: 10.1109/TPAMI.2023.3319431. Epub 2023 Nov 3.

Dictionary learning for integrative, multimodal and scalable single-cell analysis.基于字典学习的综合、多模态和可扩展的单细胞分析。

Nat Biotechnol. 2024 Feb;42(2):293-304. doi: 10.1038/s41587-023-01767-y. Epub 2023 May 25.

The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.2023 年的 STRING 数据库：针对任何感兴趣的测序基因组的蛋白质-蛋白质关联网络和功能富集分析。

Nucleic Acids Res. 2023 Jan 6;51(D1):D638-D646. doi: 10.1093/nar/gkac1000.

SREBP1c-PARP1 axis tunes anti-senescence activity of adipocytes and ameliorates metabolic imbalance in obesity.SREBP1c-PARP1 轴调节脂肪细胞的抗衰老活性，并改善肥胖中的代谢失衡。

Cell Metab. 2022 May 3;34(5):702-718.e5. doi: 10.1016/j.cmet.2022.03.010. Epub 2022 Apr 12.

CRISPR/Cas9-mediated SARM1 knockout and epitope-tagged mice reveal that SARM1 does not regulate nuclear transcription, but is expressed in macrophages.CRISPR/Cas9 介导的 SARM1 基因敲除和表位标记小鼠揭示 SARM1 不调节核转录，但在巨噬细胞中表达。

J Biol Chem. 2021 Dec;297(6):101417. doi: 10.1016/j.jbc.2021.101417. Epub 2021 Nov 16.

scRNA sequencing uncovers a TCF4-dependent transcription factor network regulating commissure development in mouse.单细胞 RNA 测序揭示了一个 TCF4 依赖性转录因子网络，该网络调节小鼠连合发育。

Development. 2021 Jul 15;148(14). doi: 10.1242/dev.196022. Epub 2021 Jul 19.

Benchmarking network-based gene prioritization methods for cerebral small vessel disease.基于网络的脑小血管病基因优先级排序方法的基准测试

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab006.

The SARM1 axon degeneration pathway: control of the NAD metabolome regulates axon survival in health and disease.SARM1 轴突退化途径：NAD 代谢组的控制调节健康和疾病中的轴突存活。

Curr Opin Neurobiol. 2020 Aug;63:59-66. doi: 10.1016/j.conb.2020.02.012. Epub 2020 Apr 17.

Transcriptional control of lung alveolar type 1 cell development and maintenance by NK homeobox 2-1.NK 同源盒 2-1 转录调控肺肺泡 1 型细胞的发育和维持。

Proc Natl Acad Sci U S A. 2019 Oct 8;116(41):20545-20555. doi: 10.1073/pnas.1906663116. Epub 2019 Sep 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PONYTA：基于生物网络的 PU 学习从小鼠 KO 事件中优先选择表型相关基因。

PONYTA: prioritization of phenotype-related genes from mouse KO events using PU learning on a biological network.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献