利用非负矩阵分解的主题建模来识别遗传变异与疾病表型之间的关系：脂蛋白(a)（LPA）的案例研究。

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA).

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America.

Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN, United States of America.

出版信息

PLoS One. 2019 Feb 13;14(2):e0212112. doi: 10.1371/journal.pone.0212112. eCollection 2019.

DOI:10.1371/journal.pone.0212112

PMID:30759150

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6374022/

Abstract

Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P < 0.001), replicating a previous finding. We also identified a negative correlation between LPA and a topic enriched for lung cancer (P < 0.001) which was not previously identified via phenome-wide scanning. We were able to replicate the top finding in a separate dataset. Our results demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases.

摘要

全基因组关联研究和表型全基因组关联研究常用于识别遗传变异与表型之间的重要关系。大多数研究将疾病作为自变量处理，由于遗传变异和疾病表型数量众多，因此受到多重调整的负担。在这项研究中，我们使用非负矩阵分解（NMF）的主题建模来识别疾病表型和遗传变异之间的关联。主题建模是一种无监督机器学习方法，可用于从电子健康记录数据中学习模式。我们选择 LPA 中的单核苷酸多态性（SNP）rs10455872 作为预测因子，因为它已被证明与血脂异常和心血管疾病（CVD）风险增加有关。使用范德比尔特大学医学中心的 12759 名具有电子健康记录（EHR）和相关 DNA 样本的个体数据，我们使用 NMF 从 1853 个不同的表型中训练了一个主题模型，并确定了六个主题。我们测试了它们与 LPA 中的 rs10455872 的关联。富含 CVD 和血脂异常的主题与 rs10455872 呈正相关（P<0.001），复制了先前的发现。我们还发现 LPA 与富含肺癌的主题之间存在负相关（P<0.001），这是以前通过表型全扫描未发现的。我们能够在另一个数据集上复制主要发现。我们的结果表明主题建模在探索遗传变异与临床疾病之间的关系方面具有适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/c01cfa062ec2/pone.0212112.g001.jpg

相似文献

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA).利用非负矩阵分解的主题建模来识别遗传变异与疾病表型之间的关系：脂蛋白(a)（LPA）的案例研究。

PLoS One. 2019 Feb 13;14(2):e0212112. doi: 10.1371/journal.pone.0212112. eCollection 2019.

LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins.载脂蛋白 LPA 变异与接受他汀类药物治疗的患者的残余心血管风险相关。

Circulation. 2018 Oct 23;138(17):1839-1849. doi: 10.1161/CIRCULATIONAHA.117.031356.

Association of LPA Variants With Aortic Stenosis: A Large-Scale Study Using Diagnostic and Procedural Codes From Electronic Health Records.载脂蛋白 LPA 变异与主动脉瓣狭窄的相关性：基于电子健康记录中的诊断和操作代码的大规模研究。

JAMA Cardiol. 2018 Jan 1;3(1):18-23. doi: 10.1001/jamacardio.2017.4266.

Investigation of LPA sequence variants rs6415084, rs3798220 with conventional coronary artery disease in Iranian CAD patients.伊朗冠心病患者中LPA序列变异rs6415084、rs3798220与传统冠状动脉疾病的相关性研究

Hum Antibodies. 2019;27(2):99-104. doi: 10.3233/HAB-180353.

TESTING POPULATION-SPECIFIC QUANTITATIVE TRAIT ASSOCIATIONS FOR CLINICAL OUTCOME RELEVANCE IN A BIOREPOSITORY LINKED TO ELECTRONIC HEALTH RECORDS: LPA AND MYOCARDIAL INFARCTION IN AFRICAN AMERICANS.在与电子健康记录相关联的生物样本库中测试特定人群的定量性状关联与临床结局的相关性：非裔美国人中的脂蛋白A与心肌梗死

Pac Symp Biocomput. 2016;21:96-107.

Association between lipoprotein(a) (Lp(a)) levels and Lp(a) genetic variants with coronary artery calcification.载脂蛋白(a)（Lp(a)）水平与 Lp(a)遗传变异体与冠状动脉钙化的关系。

BMC Med Genet. 2020 Mar 27;21(1):62. doi: 10.1186/s12881-020-01003-3.

Evidence for several independent genetic variants affecting lipoprotein (a) cholesterol levels.多个影响脂蛋白(a)胆固醇水平的独立基因变异的证据。

Hum Mol Genet. 2015 Apr 15;24(8):2390-400. doi: 10.1093/hmg/ddu731. Epub 2015 Jan 9.

Loci identified by a genome-wide association study of carotid artery stenosis in the eMERGE network.通过 eMERGE 网络进行的颈动脉狭窄全基因组关联研究确定的基因座。

Genet Epidemiol. 2021 Feb;45(1):4-15. doi: 10.1002/gepi.22360. Epub 2020 Sep 22.

Genetic variants associated with Lp(a) lipoprotein level and coronary disease.与脂蛋白(a)水平和冠心病相关的遗传变异。

N Engl J Med. 2009 Dec 24;361(26):2518-28. doi: 10.1056/NEJMoa0902604.

Relations between lipoprotein(a) concentrations, LPA genetic variants, and the risk of mortality in patients with established coronary heart disease: a molecular and genetic association study.脂蛋白(a)浓度、LPA 遗传变异与已确诊冠心病患者死亡风险的关系：一项分子遗传学关联研究。

Lancet Diabetes Endocrinol. 2017 Jul;5(7):534-543. doi: 10.1016/S2213-8587(17)30096-7. Epub 2017 May 26.

引用本文的文献

Pretreatment Lipoprotein(a) as a Biomarker for EGFR Mutation and Prognosis in Lung Adenocarcinoma.治疗前脂蛋白(a)作为肺腺癌表皮生长因子受体突变及预后的生物标志物

Int J Gen Med. 2024 Dec 27;17:6465-6478. doi: 10.2147/IJGM.S501401. eCollection 2024.

Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank.主题建模识别出英国生物银行中与多种疾病相关的新基因位点。

Cell Genom. 2023 Aug 1;3(8):100371. doi: 10.1016/j.xgen.2023.100371. eCollection 2023 Aug 9.

Improving Diagnostics with Deep Forest Applied to Electronic Health Records.深度学习森林在电子健康记录中的应用提高诊断能力。

Sensors (Basel). 2023 Jul 21;23(14):6571. doi: 10.3390/s23146571.

Polygenic Risk Score in African populations: progress and challenges.非洲人群中的多基因风险评分：进展与挑战。

F1000Res. 2023 Apr 11;11:175. doi: 10.12688/f1000research.76218.2. eCollection 2022.

Common genetic variation associated with Mendelian disease severity revealed through cryptic phenotype analysis.通过隐匿表型分析揭示与孟德尔疾病严重程度相关的常见遗传变异。

Nat Commun. 2022 Jun 27;13(1):3675. doi: 10.1038/s41467-022-31030-y.

TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.TASTE：用于电子健康记录表型分析的时间和静态张量分解

Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464.

Artificial Intelligence Pipeline to Bridge the Gap between Bench Researchers and Clinical Researchers in Precision Medicine.人工智能管道弥合精准医学基础研究人员与临床研究人员之间的差距。

Med One. 2020 Jan 10;5. doi: 10.20900/mo20200001.

Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation.将ICD - 10和ICD - 10 - CM编码映射到疾病编码：工作流程开发与初步评估

JMIR Med Inform. 2019 Nov 29;7(4):e14325. doi: 10.2196/14325.

Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study.基于电子健康记录的张量分解检测时变表型主题：心血管疾病案例研究。

J Biomed Inform. 2019 Oct;98:103270. doi: 10.1016/j.jbi.2019.103270. Epub 2019 Aug 22.

本文引用的文献

FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications.FUN-LDA：一种用于预测非编码变异组织特异性功能效应的潜在狄利克雷分配模型：方法与应用。

Am J Hum Genet. 2018 May 3;102(5):920-942. doi: 10.1016/j.ajhg.2018.03.026.

LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins.载脂蛋白 LPA 变异与接受他汀类药物治疗的患者的残余心血管风险相关。

Circulation. 2018 Oct 23;138(17):1839-1849. doi: 10.1161/CIRCULATIONAHA.117.031356.

The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development.大数据和基因组学对精准医学和药物研发的影响。

Clin Pharmacol Ther. 2018 Mar;103(3):409-418. doi: 10.1002/cpt.951. Epub 2018 Feb 5.

Polygenic loading for major depression is associated with specific medical comorbidity.多基因负荷与重度抑郁症的特定合并症有关。

Transl Psychiatry. 2017 Sep 19;7(9):e1238. doi: 10.1038/tp.2017.201.

Efficient genome-wide association in biobanks using topic modeling identifies multiple novel disease loci.利用主题建模在生物库中进行高效的全基因组关联研究，确定了多个新的疾病位点。

Mol Med. 2017 Nov;23:285-294. doi: 10.2119/molmed.2017.00100. Epub 2017 Aug 31.

Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record.评估电子健康记录中全表型关联研究的疾病编码、临床分类软件和国际疾病分类第九版临床修订本编码。

PLoS One. 2017 Jul 7;12(7):e0175508. doi: 10.1371/journal.pone.0175508. eCollection 2017.

Genetic Risk, Adherence to a Healthy Lifestyle, and Coronary Disease.遗传风险、对健康生活方式的坚持与冠心病

N Engl J Med. 2016 Dec 15;375(24):2349-2358. doi: 10.1056/NEJMoa1605086. Epub 2016 Nov 13.

An overview of topic modeling and its current applications in bioinformatics.主题建模概述及其在生物信息学中的当前应用。

Springerplus. 2016 Sep 20;5(1):1608. doi: 10.1186/s40064-016-3252-8. eCollection 2016.

Next-generation genotype imputation service and methods.下一代基因型填充服务和方法。

Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656. Epub 2016 Aug 29.

A reference panel of 64,976 haplotypes for genotype imputation.用于基因型插补的64976个单倍型参考面板。

Nat Genet. 2016 Oct;48(10):1279-83. doi: 10.1038/ng.3643. Epub 2016 Aug 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用非负矩阵分解的主题建模来识别遗传变异与疾病表型之间的关系：脂蛋白(a)（LPA）的案例研究。

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA).

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献