• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于自动编码器的深度学习基因分型填充方法。

An autoencoder-based deep learning method for genotype imputation.

作者信息

Song Meng, Greenbaum Jonathan, Luttrell Joseph, Zhou Weihua, Wu Chong, Luo Zhe, Qiu Chuan, Zhao Lan Juan, Su Kuan-Jui, Tian Qing, Shen Hui, Hong Huixiao, Gong Ping, Shi Xinghua, Deng Hong-Wen, Zhang Chaoyang

机构信息

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States.

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

出版信息

Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.

DOI:10.3389/frai.2022.1028978
PMID:36406474
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9671213/
Abstract

Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.

摘要

基因型填充在全基因组关联研究(GWAS)中有广泛应用,包括提高关联检验的统计效能、在荟萃分析中发现与性状相关的基因座以及通过精细定位对因果变异进行优先级排序。近年来,已经开发了基于深度学习(DL)的方法,如稀疏卷积去噪自动编码器(SCDA)用于基因型填充。然而,在基于DL的方法中优化学习过程以实现高填充准确率仍然是一项具有挑战性的任务。为应对这一挑战,我们开发了一种用于基因型填充的卷积自动编码器(AE)模型,并通过使用单个批次损失而非批次平均损失来修改训练过程,实现了定制的训练循环。使用酵母数据集、来自千人基因组计划(1KGP)的人类白细胞抗原(HLA)数据以及我们来自路易斯安那骨质疏松症研究(LOS)的内部基因型数据对这种改进的AE填充模型进行了评估。在所有三个数据集中,就一致性率(CR)、海林格得分、缩放欧几里得范数(SEN)得分和填充质量得分(IQS)等评估指标而言,我们改进的AE填充模型取得了与现有SCDA模型相当或更好的性能。以HLA数据的填充结果为例,AE模型在缺失率为10%和20%时,平均CR分别为0.9468和0.9459,海林格得分分别为0.9765和0.9518,SEN得分分别为0.9977和0.9953,IQS分别为0.9515和0.9044。至于LOS数据的结果,在缺失率为20%时,其平均CR为0.9005,海林格得分0.9384,SEN得分0.9940,IQS为0.8681。总之,我们提出的基因型填充方法在提高GWAS的统计效能和改进GWAS下游分析方面具有巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/fec6d7b9441d/frai-05-1028978-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/4697ca2570b3/frai-05-1028978-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/304cad277555/frai-05-1028978-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/32e0801c9503/frai-05-1028978-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/de7c792a0654/frai-05-1028978-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/7ecccf965756/frai-05-1028978-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/fec6d7b9441d/frai-05-1028978-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/4697ca2570b3/frai-05-1028978-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/304cad277555/frai-05-1028978-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/32e0801c9503/frai-05-1028978-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/de7c792a0654/frai-05-1028978-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/7ecccf965756/frai-05-1028978-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e855/9671213/fec6d7b9441d/frai-05-1028978-g0006.jpg

相似文献

1
An autoencoder-based deep learning method for genotype imputation.一种基于自动编码器的深度学习基因分型填充方法。
Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.
2
Sparse Convolutional Denoising Autoencoders for Genotype Imputation.稀疏卷积去噪自动编码器在基因型推断中的应用。
Genes (Basel). 2019 Aug 28;10(9):652. doi: 10.3390/genes10090652.
3
Deep Learning Approach for Imputation of Missing Values in Actigraphy Data: Algorithm Development Study.深度学习方法在运动数据缺失值插补中的应用:算法开发研究。
JMIR Mhealth Uhealth. 2020 Jul 23;8(7):e16113. doi: 10.2196/16113.
4
Deep Learning-Based HLA Allele Imputation Applicable to GWAS.基于深度学习的 HLA 等位基因推断适用于 GWAS。
Methods Mol Biol. 2024;2809:77-85. doi: 10.1007/978-1-0716-3874-3_5.
5
A ν-support vector regression based approach for predicting imputation quality.一种基于ν支持向量回归的插补质量预测方法。
BMC Proc. 2012 Nov 13;6 Suppl 7(Suppl 7):S3. doi: 10.1186/1753-6561-6-S7-S3.
6
Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data.利用公开的汇总水平数据,通过跨多个数据集的基因型进行转录组元推断。
PLoS Genet. 2022 Jan 31;18(1):e1009571. doi: 10.1371/journal.pgen.1009571. eCollection 2022 Jan.
7
Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。
BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.
8
Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.处理医疗保健数据中的缺失值:基于深度学习的插补技术的系统评价。
Artif Intell Med. 2023 Aug;142:102587. doi: 10.1016/j.artmed.2023.102587. Epub 2023 May 22.
9
Evaluation methodology for deep learning imputation models.深度学习插补模型的评估方法。
Exp Biol Med (Maywood). 2022 Nov;247(22):1972-1987. doi: 10.1177/15353702221121602. Epub 2022 Sep 21.
10
Two adjustment strategies for imputation across genotyping arrays.跨基因分型阵列插补的两种调整策略。
Hum Hered. 2014;78(2):73-80. doi: 10.1159/000363337. Epub 2014 Jul 16.

引用本文的文献

1
STICI: Split-Transformer with integrated convolutions for genotype imputation.STICI:用于基因型填充的集成卷积拆分变压器
Nat Commun. 2025 Jan 31;16(1):1218. doi: 10.1038/s41467-025-56273-3.
2
Multi-View Integrative Approach For Imputing Short-Chain Fatty Acids and Identifying Key factors predicting Blood SCFA.用于估算短链脂肪酸和识别预测血液中短链脂肪酸关键因素的多视图整合方法
bioRxiv. 2024 Sep 27:2024.09.25.614767. doi: 10.1101/2024.09.25.614767.
3
Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.

本文引用的文献

1
A complete reference genome improves analysis of human genetic variation.完整的参考基因组提高了人类遗传变异分析的能力。
Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1.
2
A multiethnic whole genome sequencing study to identify novel loci for bone mineral density.一项针对多种族全基因组测序的研究,旨在确定骨密度的新基因座。
Hum Mol Genet. 2022 Mar 31;31(7):1067-1081. doi: 10.1093/hmg/ddab305.
3
Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data.插补质量评估:真实数据中定相和插补算法的比较
两阶段策略使用去噪自动编码器实现稳健的无参考基因型缺失输入基因型的基因型推断。
J Hum Genet. 2024 Oct;69(10):511-518. doi: 10.1038/s10038-024-01261-6. Epub 2024 Jun 25.
4
Advances in AI and machine learning for predictive medicine.人工智能和机器学习在预测医学中的进展。
J Hum Genet. 2024 Oct;69(10):487-497. doi: 10.1038/s10038-024-01231-y. Epub 2024 Feb 29.
5
Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology.利用深度学习技术对全基因组和复杂基因组区域进行基因型推断的方法。
J Hum Genet. 2024 Oct;69(10):481-486. doi: 10.1038/s10038-023-01213-6. Epub 2024 Jan 15.
6
Deep Learning Methods for Omics Data Imputation.用于组学数据插补的深度学习方法。
Biology (Basel). 2023 Oct 7;12(10):1313. doi: 10.3390/biology12101313.
Front Genet. 2021 Sep 22;12:724037. doi: 10.3389/fgene.2021.724037. eCollection 2021.
4
Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.用于分析犬低深度测序中导入基因型的最佳实践。
Mamm Genome. 2022 Mar;33(1):213-229. doi: 10.1007/s00335-021-09914-z. Epub 2021 Sep 8.
5
Rapid genotype imputation from sequence with reference panels.基于参考面板的序列快速基因型推断。
Nat Genet. 2021 Jul;53(7):1104-1111. doi: 10.1038/s41588-021-00877-0. Epub 2021 Jun 3.
6
A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes.一种用于 1 型糖尿病 HLA 推断和跨种族 MHC 精细定位的深度学习方法。
Nat Commun. 2021 Mar 12;12(1):1639. doi: 10.1038/s41467-021-21975-x.
7
Accurate Imputation of Untyped Variants from Deep Sequencing Data.从深度测序数据中准确推断未分型变异。
Methods Mol Biol. 2021;2243:271-281. doi: 10.1007/978-1-0716-1103-6_13.
8
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
9
Efficient phasing and imputation of low-coverage sequencing data using large reference panels.利用大型参考面板实现低覆盖度测序数据的高效相位推断和插补。
Nat Genet. 2021 Jan;53(1):120-126. doi: 10.1038/s41588-020-00756-0. Epub 2021 Jan 7.
10
Genotype imputation using the Positional Burrows Wheeler Transform.基于位置的 Burrows-Wheeler 变换的基因型推断。
PLoS Genet. 2020 Nov 16;16(11):e1009049. doi: 10.1371/journal.pgen.1009049. eCollection 2020 Nov.