• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

KOMPUTE:推算高通量模式生物数据中缺失表型的汇总统计量。

KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data.

作者信息

Warkentin Coby, O'Connell Michael J, Lee Donghyung

机构信息

Department of Statistics, Miami University, Oxford, OH 45056, United States.

InfoWorks, Inc., Nashville, TN 37205, United States.

出版信息

Bioinform Adv. 2023 Aug 1;3(1):vbad100. doi: 10.1093/bioadv/vbad100. eCollection 2023.

DOI:10.1093/bioadv/vbad100
PMID:37565237
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10409646/
Abstract

MOTIVATION

The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90 000 gene-phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; ∼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16).

RESULTS

To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios.

AVAILABILITY AND IMPLEMENTATION

An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples.

摘要

动机

国际小鼠表型分析联盟(IMPC)正致力于通过系统地培育和分析小鼠基因组中几乎每个蛋白质编码基因的基因敲除小鼠,并测试基因功能丧失与表型之间的关联,构建一个全面的哺乳动物蛋白质编码基因功能目录。到目前为止,IMPC已经确定了超过90000个基因-表型关联,但每个基因的许多表型尚未测量,导致数据在很大程度上不完整;在最新的IMPC汇总统计数据集中(IMPC发布版本16),约75.6%的关联汇总统计数据仍然缺失。

结果

为了克服这些挑战,我们提出了KOMPUTE,这是一种用于估算IMPC数据集中缺失汇总统计数据的新方法。利用多元正态分布的条件分布特性,KOMPUTE将特定基因未测量表型的关联Z分数估计为给定测量表型Z分数的条件期望。我们使用模拟和真实数据集对该方法进行的评估表明,在各种情况下,它都优于奇异值分解矩阵补全方法。

可用性和实现方式

KOMPUTE的R包可在https://github.com/statsleelab/kompute上公开获取,同时在https://statsleelab.github.io/komputeExamples上还提供了不同表型领域的使用示例和结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e13b/10409646/0ab98433e986/vbad100f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e13b/10409646/90d00a966f9a/vbad100f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e13b/10409646/0ab98433e986/vbad100f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e13b/10409646/90d00a966f9a/vbad100f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e13b/10409646/0ab98433e986/vbad100f2.jpg

相似文献

1
KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data.KOMPUTE:推算高通量模式生物数据中缺失表型的汇总统计量。
Bioinform Adv. 2023 Aug 1;3(1):vbad100. doi: 10.1093/bioadv/vbad100. eCollection 2023.
2
The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation.国际小鼠表型分析联盟(IMPC):一份为保护工作提供信息的哺乳动物基因组功能目录。
Conserv Genet. 2018;19(4):995-1005. doi: 10.1007/s10592-018-1072-9. Epub 2018 May 19.
3
Multivariate phenotype analysis enables genome-wide inference of mammalian gene function.多变量表型分析使哺乳动物基因功能的全基因组推断成为可能。
PLoS Biol. 2022 Aug 9;20(8):e3001723. doi: 10.1371/journal.pbio.3001723. eCollection 2022 Aug.
4
The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease.国际小鼠表型分析联盟:全面的基因敲除表型分析为人类疾病研究提供支撑。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1038-D1045. doi: 10.1093/nar/gkac972.
5
Automated pipeline for anatomical phenotyping of mouse embryos using micro-CT.使用微型计算机断层扫描对小鼠胚胎进行解剖表型分析的自动化流程
Development. 2014 Jun;141(12):2533-41. doi: 10.1242/dev.107722. Epub 2014 May 21.
6
Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses.Adapt-Mix:学习局部遗传相关结构可改善基于汇总统计量的分析。
Bioinformatics. 2015 Jun 15;31(12):i181-9. doi: 10.1093/bioinformatics/btv230.
7
Estimating colocalization probability from limited summary statistics.从有限的汇总统计数据中估计共定位概率。
BMC Bioinformatics. 2021 May 17;22(1):254. doi: 10.1186/s12859-021-04170-z.
8
Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes.在存在缺失数据的情况下进行适当的条件分析:在大规模烟草使用表型的荟萃分析中的应用。
PLoS Genet. 2018 Jul 17;14(7):e1007452. doi: 10.1371/journal.pgen.1007452. eCollection 2018 Jul.
9
Predicting human disease mutations and identifying drug targets from mouse gene knockout phenotyping campaigns.从鼠基因敲除表型研究中预测人类疾病突变和鉴定药物靶点。
Dis Model Mech. 2019 May 7;12(5):dmm038224. doi: 10.1242/dmm.038224.
10
New models for human disease from the International Mouse Phenotyping Consortium.国际小鼠表型分析联盟构建人类疾病的新模型。
Mamm Genome. 2019 Jun;30(5-6):143-150. doi: 10.1007/s00335-019-09804-5. Epub 2019 May 24.

本文引用的文献

1
OpenStats: A robust and scalable software package for reproducible analysis of high-throughput phenotypic data.OpenStats:一个用于高通量表型数据可重复分析的强大且可扩展的软件包。
PLoS One. 2020 Dec 30;15(12):e0242933. doi: 10.1371/journal.pone.0242933. eCollection 2020.
2
Evaluation and application of summary statistic imputation to discover new height-associated loci.评估和应用汇总统计推断发现新的身高相关位点。
PLoS Genet. 2018 May 21;14(5):e1007371. doi: 10.1371/journal.pgen.1007371. eCollection 2018 May.
3
Comparison of Genotypic and Phenotypic Correlations: Cheverud's Conjecture in Humans.
基因型与表型相关性的比较:人类中的 Cheverud 假说。
Genetics. 2018 Jul;209(3):941-948. doi: 10.1534/genetics.117.300630. Epub 2018 May 8.
4
Disease model discovery from 3,328 gene knockouts by The International Mouse Phenotyping Consortium.国际小鼠表型分析联盟从3328个基因敲除实验中发现疾病模型。
Nat Genet. 2017 Aug;49(8):1231-1238. doi: 10.1038/ng.3901. Epub 2017 Jun 26.
5
A COMPARISON OF GENETIC AND PHENOTYPIC CORRELATIONS.遗传相关性与表型相关性的比较
Evolution. 1988 Sep;42(5):958-968. doi: 10.1111/j.1558-5646.1988.tb02514.x.
6
High-throughput discovery of novel developmental phenotypes.新型发育表型的高通量发现
Nature. 2016 Sep 22;537(7621):508-514. doi: 10.1038/nature19356. Epub 2016 Sep 14.
7
DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.DISTMIX:从混合种族队列中直接推算未测量单核苷酸多态性的汇总统计量。
Bioinformatics. 2015 Oct 1;31(19):3099-104. doi: 10.1093/bioinformatics/btv348. Epub 2015 Jun 9.
8
DIST: direct imputation of summary statistics for unmeasured SNPs.直接对未测量的 SNP 进行汇总统计的推断。
Bioinformatics. 2013 Nov 15;29(22):2925-7. doi: 10.1093/bioinformatics/btt500. Epub 2013 Aug 28.
9
Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.去除表达微阵列数据分析中的批次效应:六种批次调整方法的评估。
PLoS One. 2011 Feb 28;6(2):e17238. doi: 10.1371/journal.pone.0017238.
10
Adjusting batch effects in microarray expression data using empirical Bayes methods.使用经验贝叶斯方法调整微阵列表达数据中的批次效应。
Biostatistics. 2007 Jan;8(1):118-27. doi: 10.1093/biostatistics/kxj037. Epub 2006 Apr 21.