文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

利用非负矩阵分解的主题建模来识别遗传变异与疾病表型之间的关系:脂蛋白(a)(LPA)的案例研究。

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA).

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America.

Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN, United States of America.

出版信息

PLoS One. 2019 Feb 13;14(2):e0212112. doi: 10.1371/journal.pone.0212112. eCollection 2019.


DOI:10.1371/journal.pone.0212112
PMID:30759150
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6374022/
Abstract

Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P < 0.001), replicating a previous finding. We also identified a negative correlation between LPA and a topic enriched for lung cancer (P < 0.001) which was not previously identified via phenome-wide scanning. We were able to replicate the top finding in a separate dataset. Our results demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases.

摘要

全基因组关联研究和表型全基因组关联研究常用于识别遗传变异与表型之间的重要关系。大多数研究将疾病作为自变量处理,由于遗传变异和疾病表型数量众多,因此受到多重调整的负担。在这项研究中,我们使用非负矩阵分解(NMF)的主题建模来识别疾病表型和遗传变异之间的关联。主题建模是一种无监督机器学习方法,可用于从电子健康记录数据中学习模式。我们选择 LPA 中的单核苷酸多态性(SNP)rs10455872 作为预测因子,因为它已被证明与血脂异常和心血管疾病(CVD)风险增加有关。使用范德比尔特大学医学中心的 12759 名具有电子健康记录(EHR)和相关 DNA 样本的个体数据,我们使用 NMF 从 1853 个不同的表型中训练了一个主题模型,并确定了六个主题。我们测试了它们与 LPA 中的 rs10455872 的关联。富含 CVD 和血脂异常的主题与 rs10455872 呈正相关(P<0.001),复制了先前的发现。我们还发现 LPA 与富含肺癌的主题之间存在负相关(P<0.001),这是以前通过表型全扫描未发现的。我们能够在另一个数据集上复制主要发现。我们的结果表明主题建模在探索遗传变异与临床疾病之间的关系方面具有适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/bd4317e08a17/pone.0212112.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/c01cfa062ec2/pone.0212112.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/b41393cd82b1/pone.0212112.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/1c3c64d9d3a6/pone.0212112.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/c0f1c2a93f18/pone.0212112.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/bd4317e08a17/pone.0212112.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/c01cfa062ec2/pone.0212112.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/b41393cd82b1/pone.0212112.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/1c3c64d9d3a6/pone.0212112.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/c0f1c2a93f18/pone.0212112.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9474/6374022/bd4317e08a17/pone.0212112.g005.jpg

相似文献

[1]
Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA).

PLoS One. 2019-2-13

[2]
LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins.

Circulation. 2018-10-23

[3]
Association of LPA Variants With Aortic Stenosis: A Large-Scale Study Using Diagnostic and Procedural Codes From Electronic Health Records.

JAMA Cardiol. 2018-1-1

[4]
Investigation of LPA sequence variants rs6415084, rs3798220 with conventional coronary artery disease in Iranian CAD patients.

Hum Antibodies. 2019

[5]
TESTING POPULATION-SPECIFIC QUANTITATIVE TRAIT ASSOCIATIONS FOR CLINICAL OUTCOME RELEVANCE IN A BIOREPOSITORY LINKED TO ELECTRONIC HEALTH RECORDS: LPA AND MYOCARDIAL INFARCTION IN AFRICAN AMERICANS.

Pac Symp Biocomput. 2016

[6]
Association between lipoprotein(a) (Lp(a)) levels and Lp(a) genetic variants with coronary artery calcification.

BMC Med Genet. 2020-3-27

[7]
Evidence for several independent genetic variants affecting lipoprotein (a) cholesterol levels.

Hum Mol Genet. 2015-4-15

[8]
Loci identified by a genome-wide association study of carotid artery stenosis in the eMERGE network.

Genet Epidemiol. 2021-2

[9]
Genetic variants associated with Lp(a) lipoprotein level and coronary disease.

N Engl J Med. 2009-12-24

[10]
Relations between lipoprotein(a) concentrations, LPA genetic variants, and the risk of mortality in patients with established coronary heart disease: a molecular and genetic association study.

Lancet Diabetes Endocrinol. 2017-5-26

引用本文的文献

[1]
Pretreatment Lipoprotein(a) as a Biomarker for EGFR Mutation and Prognosis in Lung Adenocarcinoma.

Int J Gen Med. 2024-12-27

[2]
Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank.

Cell Genom. 2023-8-1

[3]
Improving Diagnostics with Deep Forest Applied to Electronic Health Records.

Sensors (Basel). 2023-7-21

[4]
Polygenic Risk Score in African populations: progress and challenges.

F1000Res. 2022

[5]
Common genetic variation associated with Mendelian disease severity revealed through cryptic phenotype analysis.

Nat Commun. 2022-6-27

[6]
TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.

Proc ACM Conf Health Inference Learn (2020). 2020-4

[7]
Artificial Intelligence Pipeline to Bridge the Gap between Bench Researchers and Clinical Researchers in Precision Medicine.

Med One. 2020-1-10

[8]
Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation.

JMIR Med Inform. 2019-11-29

[9]
Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study.

J Biomed Inform. 2019-8-22

本文引用的文献

[1]
FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications.

Am J Hum Genet. 2018-5-3

[2]
LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins.

Circulation. 2018-10-23

[3]
The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development.

Clin Pharmacol Ther. 2018-2-5

[4]
Polygenic loading for major depression is associated with specific medical comorbidity.

Transl Psychiatry. 2017-9-19

[5]
Efficient genome-wide association in biobanks using topic modeling identifies multiple novel disease loci.

Mol Med. 2017-8-31

[6]
Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record.

PLoS One. 2017-7-7

[7]
Genetic Risk, Adherence to a Healthy Lifestyle, and Coronary Disease.

N Engl J Med. 2016-12-15

[8]
An overview of topic modeling and its current applications in bioinformatics.

Springerplus. 2016-9-20

[9]
Next-generation genotype imputation service and methods.

Nat Genet. 2016-10

[10]
A reference panel of 64,976 haplotypes for genotype imputation.

Nat Genet. 2016-10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索