• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对53000个小鼠模型的表型进行降维分析,揭示了基因功能的多样图景。

Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function.

作者信息

Konopka Tomasz, Vestito Letizia, Smedley Damian

机构信息

William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK.

Ear Institute, University College London, WC1X 8EE London, UK.

出版信息

Bioinform Adv. 2021 Oct 11;1(1):vbab026. doi: 10.1093/bioadv/vbab026. eCollection 2021.

DOI:10.1093/bioadv/vbab026
PMID:34870209
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8633315/
Abstract

UNLABELLED

Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature and high-throughput phenotyping screens, the collective body of knowledge for the mouse now covers the majority of protein-coding genes. We here collected data for over 53 000 mouse models with mutations in over 15 000 genomic markers and characterized by more than 254 000 annotations using more than 9000 distinct ontology terms. We investigated dimensional reduction and embedding techniques as means to facilitate access to this diverse and high-dimensional information. Our analyses provide the first visual maps of the landscape of mouse phenotypic diversity. We also summarize some of the difficulties in producing and interpreting embeddings of sparse phenotypic data. In particular, we show that data preprocessing, filtering and encoding have as much impact on the final embeddings as the process of dimensional reduction. Nonetheless, techniques developed in the context of dimensional reduction create opportunities for explorative analysis of this large pool of public data, including for searching for mouse models suited to study human diseases.

AVAILABILITY AND IMPLEMENTATION

Source code for analysis scripts is available on GitHub at https://github.com/tkonopka/mouse-embeddings. The data underlying this article are available in Zenodo at https://doi.org/10.5281/zenodo.4916171.

CONTACT

t.konopka@qmul.ac.uk.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

未标注

长期以来,动物模型一直被用于研究基因功能以及基因突变对表型的影响。通过数千个研究小组的研究工作、已发表文献的系统整理以及高通量表型筛选,目前关于小鼠的知识体系涵盖了大部分蛋白质编码基因。我们在此收集了超过53000个小鼠模型的数据,这些模型在超过15000个基因组标记中存在突变,并使用超过9000个不同的本体术语进行了超过254000次注释。我们研究了降维和嵌入技术,作为获取这些多样且高维信息的手段。我们的分析提供了小鼠表型多样性景观的首张可视化图谱。我们还总结了在生成和解释稀疏表型数据嵌入时遇到的一些困难。特别是,我们表明数据预处理(过滤和编码)对最终嵌入的影响与降维过程一样大。尽管如此,在降维背景下开发的技术为探索性分析这一大量公共数据创造了机会,包括寻找适合研究人类疾病的小鼠模型。

可用性与实现

分析脚本的源代码可在GitHub上获取,网址为https://github.com/tkonopka/mouse-embeddings。本文所依据的数据可在Zenodo上获取,网址为https://doi.org/10.5281/zenodo.4916171。

联系方式

t.konopka@qmul.ac.uk。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/8e62f37f2330/vbab026f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/bf338c33bcaa/vbab026f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/6032112d0452/vbab026f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/cb96787517ba/vbab026f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/01b293b5f8bd/vbab026f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/8e62f37f2330/vbab026f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/bf338c33bcaa/vbab026f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/6032112d0452/vbab026f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/cb96787517ba/vbab026f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/01b293b5f8bd/vbab026f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6882/9710685/8e62f37f2330/vbab026f5.jpg

相似文献

1
Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function.对53000个小鼠模型的表型进行降维分析,揭示了基因功能的多样图景。
Bioinform Adv. 2021 Oct 11;1(1):vbab026. doi: 10.1093/bioadv/vbab026. eCollection 2021.
2
VeloViz: RNA velocity-informed embeddings for visualizing cellular trajectories.VeloViz:用于可视化细胞轨迹的基于RNA速度信息的嵌入
Bioinformatics. 2022 Jan 3;38(2):391-396. doi: 10.1093/bioinformatics/btab653.
3
PheneBank: a literature-based database of phenotypes.PheneBank:基于文献的表型数据库。
Bioinformatics. 2022 Jan 27;38(4):1179-1180. doi: 10.1093/bioinformatics/btab740.
4
Phenopolis: an open platform for harmonization and analysis of genetic and phenotypic data.表型组学平台:用于遗传和表型数据协调和分析的开放平台。
Bioinformatics. 2017 Aug 1;33(15):2421-2423. doi: 10.1093/bioinformatics/btx147.
5
MODIG: integrating multi-omics and multi-dimensional gene network for cancer driver gene identification based on graph attention network model.MODIG:基于图注意力网络模型的多组学和多维基因网络整合用于癌症驱动基因识别。
Bioinformatics. 2022 Oct 31;38(21):4901-4907. doi: 10.1093/bioinformatics/btac622.
6
Comparison of genetic variants in matched samples using thesaurus annotation.使用同义词库注释对匹配样本中的基因变异进行比较。
Bioinformatics. 2016 Mar 1;32(5):657-63. doi: 10.1093/bioinformatics/btv654. Epub 2015 Nov 5.
7
Clustering FunFams using sequence embeddings improves EC purity.使用序列嵌入对功能家族进行聚类可提高酶委员会(EC)纯度。
Bioinformatics. 2021 Oct 25;37(20):3449-3455. doi: 10.1093/bioinformatics/btab371.
8
deepSimDEF: deep neural embeddings of gene products and gene ontology terms for functional analysis of genes.deepSimDEF:用于基因功能分析的基因产物和基因本体论术语的深度神经嵌入。
Bioinformatics. 2022 May 26;38(11):3051-3061. doi: 10.1093/bioinformatics/btac304.
9
Mpox Knowledge Graph: a comprehensive representation embedding chemical entities and associated biology of Mpox.猴痘知识图谱:一种全面的表示,嵌入了猴痘的化学实体及相关生物学信息。
Bioinform Adv. 2023 Apr 3;3(1):vbad045. doi: 10.1093/bioadv/vbad045. eCollection 2023.
10
scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling.scPNMF:稀疏的单细胞基因编码,以方便选择用于靶向基因分析的基因。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i358-i366. doi: 10.1093/bioinformatics/btab273.

引用本文的文献

1
Low-dimensional genotype-fitness mapping across divergent environments suggests a limiting functions model of fitness.跨不同环境的低维基因型-适应性映射表明了一种适应性的限制函数模型。
bioRxiv. 2025 May 31:2025.04.05.647371. doi: 10.1101/2025.04.05.647371.

本文引用的文献

1
Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.扩散使得在桌面知识库中实现异构数据的集成和用户驱动的学习成为可能。
PLoS Comput Biol. 2021 Aug 11;17(8):e1009283. doi: 10.1371/journal.pcbi.1009283. eCollection 2021 Aug.
2
HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball.HiG2Vec:庞加莱球中基因本体论和基因的层次表示。
Bioinformatics. 2021 Sep 29;37(18):2971-2980. doi: 10.1093/bioinformatics/btab193.
3
Mouse Genome Database (MGD): Knowledgebase for mouse-human comparative biology.
鼠类基因组数据库(MGD):小鼠与人类比较生物学知识库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D981-D987. doi: 10.1093/nar/gkaa1083.
4
Poincaré maps for analyzing complex hierarchies in single-cell data.用于分析单细胞数据中复杂层次结构的庞加莱映射。
Nat Commun. 2020 Jun 11;11(1):2966. doi: 10.1038/s41467-020-16822-4.
5
Single-Cell Transcriptome Atlas of Murine Endothelial Cells.单细胞转录组图谱:鼠类血管内皮细胞
Cell. 2020 Feb 20;180(4):764-779.e20. doi: 10.1016/j.cell.2020.01.015. Epub 2020 Feb 13.
6
Incremental data integration for tracking genotype-disease associations.用于跟踪基因型-疾病关联的增量数据集成。
PLoS Comput Biol. 2020 Jan 27;16(1):e1007586. doi: 10.1371/journal.pcbi.1007586. eCollection 2020 Jan.
7
Visualizing structure and transitions in high-dimensional biological data.高维生物数据中的结构和转变可视化。
Nat Biotechnol. 2019 Dec;37(12):1482-1492. doi: 10.1038/s41587-019-0336-3. Epub 2019 Dec 3.
8
Gain-of-function mutation of microRNA-140 in human skeletal dysplasia.人类骨骼发育不良中 microRNA-140 的功能获得性突变。
Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2. Epub 2019 Feb 25.
9
The single-cell transcriptional landscape of mammalian organogenesis.哺乳动物器官发生的单细胞转录组图谱。
Nature. 2019 Feb;566(7745):496-502. doi: 10.1038/s41586-019-0969-x. Epub 2019 Feb 20.
10
Dimensionality reduction for visualizing single-cell data using UMAP.使用UMAP进行单细胞数据可视化的降维方法。
Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.