• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于监督学习的人群遗传分类。

Genetic classification of populations using supervised learning.

机构信息

Astrophysics Group, Cavendish Laboratory, Cambridge, United Kingdom.

出版信息

PLoS One. 2011 May 12;6(5):e14802. doi: 10.1371/journal.pone.0014802.

DOI:10.1371/journal.pone.0014802
PMID:21589856
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3093382/
Abstract

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

摘要

在遗传学中,有许多情况下我们希望根据候选群体的遗传结构来确定它们是否可以区分。例如,地理上分离的群体、病例对照研究和质量控制(当研究中的参与者在不同的实验室进行基因分型时)。后一种应用在大规模全基因组关联研究时代尤为重要,当时来自不同地点的个体的基因分型集合被合并以提供更大的功率。在群体中检测结构的传统方法是某种形式的探索性技术,如主成分分析。这些方法不利用我们对候选群体成员身份的先验知识,因此被称为无监督。另一方面,有监督的方法在可用时能够利用这种先验知识。在本文中,我们证明在这种情况下,现代有监督的方法是检测群体之间遗传差异的更合适的工具。我们将两种这样的方法(神经网络和支持向量机)应用于三个群体(两个来自苏格兰,一个来自保加利亚)的分类。这两种方法的灵敏度都明显高于主成分分析所达到的灵敏度,实际上远远超过了最近对无监督方法灵敏度的理论猜测。特别是,我们的方法可以区分两个苏格兰群体,而主成分分析则不能。根据我们的结果,我们建议在将个体分类到预定义群体时,应选择有监督的学习方法,特别是在大规模全基因组关联研究的质量控制中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/6665436bef3e/pone.0014802.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/0f3caebf6e61/pone.0014802.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/804bf9177162/pone.0014802.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/0012b663d6ca/pone.0014802.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/356ca8752c50/pone.0014802.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/ae405ca633de/pone.0014802.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/6fd9700acb15/pone.0014802.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/c6a66f24ce6d/pone.0014802.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/750cbd603b37/pone.0014802.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/61b505af1607/pone.0014802.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/103b9245237f/pone.0014802.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/663b2b1794de/pone.0014802.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/6665436bef3e/pone.0014802.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/0f3caebf6e61/pone.0014802.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/804bf9177162/pone.0014802.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/0012b663d6ca/pone.0014802.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/356ca8752c50/pone.0014802.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/ae405ca633de/pone.0014802.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/6fd9700acb15/pone.0014802.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/c6a66f24ce6d/pone.0014802.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/750cbd603b37/pone.0014802.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/61b505af1607/pone.0014802.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/103b9245237f/pone.0014802.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/663b2b1794de/pone.0014802.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7044/3093382/6665436bef3e/pone.0014802.g012.jpg

相似文献

1
Genetic classification of populations using supervised learning.基于监督学习的人群遗传分类。
PLoS One. 2011 May 12;6(5):e14802. doi: 10.1371/journal.pone.0014802.
2
KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis.KLFDAPC:一种用于空间遗传结构分析的有监督机器学习方法。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac202.
3
Searching for disease susceptibility variants in structured populations.在结构化人群中寻找疾病易感性变异体。
Genomics. 2009 Jan;93(1):1-4. doi: 10.1016/j.ygeno.2008.04.004. Epub 2008 Jun 2.
4
A Hybrid Supervised Approach to Human Population Identification Using Genomics Data.基于基因组学数据的人类群体识别混合监督方法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):443-454. doi: 10.1109/TCBB.2019.2919501. Epub 2021 Apr 6.
5
Choice of population structure informative principal components for adjustment in a case-control study.用于病例对照研究中调整的群体结构信息主成分的选择。
BMC Genet. 2011 Jul 19;12:64. doi: 10.1186/1471-2156-12-64.
6
Assessing the power of principal components and wright's fixation index analyzes applied to reveal the genome-wide genetic differences between herds of Holstein cows.评估主成分和 Wright 的固定指数分析的功效,应用于揭示荷斯坦奶牛群体间的全基因组遗传差异。
BMC Genet. 2020 Apr 28;21(1):47. doi: 10.1186/s12863-020-00848-0.
7
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
8
PCA-correlated SNPs for structure identification in worldwide human populations.用于全球人类群体结构识别的与主成分分析相关的单核苷酸多态性
PLoS Genet. 2007 Sep;3(9):1672-86. doi: 10.1371/journal.pgen.0030160.
9
Applications of machine learning and data mining methods to detect associations of rare and common variants with complex traits.应用机器学习和数据挖掘方法检测罕见和常见变异与复杂性状之间的关联。
Genet Epidemiol. 2014 Sep;38 Suppl 1:S81-5. doi: 10.1002/gepi.21830.
10
Genome wide analysis reveals genetic divergence between Goldsinny wrasse populations.全基因组分析揭示金鳞鱼种群间的遗传分化。
BMC Genet. 2020 Oct 9;21(1):118. doi: 10.1186/s12863-020-00921-8.

引用本文的文献

1
Machine learning for genetic prediction of psychiatric disorders: a systematic review.机器学习在精神障碍遗传预测中的应用:系统综述。
Mol Psychiatry. 2021 Jan;26(1):70-79. doi: 10.1038/s41380-020-0825-2. Epub 2020 Jun 26.
2
Predictive modeling of schizophrenia from genomic data: Comparison of polygenic risk score with kernel support vector machines approach.基于基因组数据的精神分裂症预测建模:多基因风险评分与核支持向量机方法的比较。
Am J Med Genet B Neuropsychiatr Genet. 2019 Jan;180(1):80-85. doi: 10.1002/ajmg.b.32705. Epub 2018 Dec 4.
3
Prediction of treatment response in rheumatoid arthritis patients using genome-wide SNP data.

本文引用的文献

1
Reconstructing Indian population history.重构印度人口历史。
Nature. 2009 Sep 24;461(7263):489-94. doi: 10.1038/nature08365.
2
Common polygenic variation contributes to risk of schizophrenia and bipolar disorder.常见的多基因变异会增加患精神分裂症和双相情感障碍的风险。
Nature. 2009 Aug 6;460(7256):748-52. doi: 10.1038/nature08185. Epub 2009 Jul 1.
3
Genetic structure of Europeans: a view from the North-East.欧洲人的基因结构:来自东北部的视角。
利用全基因组单核苷酸多态性(SNP)数据预测类风湿关节炎患者的治疗反应。
Genet Epidemiol. 2018 Dec;42(8):754-771. doi: 10.1002/gepi.22159. Epub 2018 Oct 12.
4
Detecting responses to treatment with fenofibrate in pedigrees.在家系中检测非诺贝特治疗的反应。
BMC Genet. 2018 Sep 17;19(Suppl 1):64. doi: 10.1186/s12863-018-0652-5.
5
From genes to behavior: placing cognitive models in the context of biological pathways.从基因到行为:将认知模型置于生物途径的背景中。
Front Neurosci. 2014 Nov 4;8:336. doi: 10.3389/fnins.2014.00336. eCollection 2014.
6
The genomic psychiatry cohort: partners in discovery.基因组精神病学队列:发现的伙伴。
Am J Med Genet B Neuropsychiatr Genet. 2013 Jun;162B(4):306-12. doi: 10.1002/ajmg.b.32160. Epub 2013 May 3.
PLoS One. 2009;4(5):e5472. doi: 10.1371/journal.pone.0005472. Epub 2009 May 8.
4
Neural networks for genetic epidemiology: past, present, and future.神经网络在遗传流行病学中的应用:过去、现在和未来。
BioData Min. 2008 Jul 17;1(1):3. doi: 10.1186/1756-0381-1-3.
5
Correlation between genetic and geographic structure in Europe.欧洲基因结构与地理结构之间的相关性。
Curr Biol. 2008 Aug 26;18(16):1241-8. doi: 10.1016/j.cub.2008.07.049. Epub 2008 Aug 7.
6
PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK:一个用于全基因组关联分析和基于群体的连锁分析的工具集。
Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.
7
Comparison of artificial neural network analysis with other multimarker methods for detecting genetic association.人工神经网络分析与其他多标记物方法在检测基因关联方面的比较。
BMC Genet. 2007 Jul 18;8:49. doi: 10.1186/1471-2156-8-49.
8
Population structure and eigenanalysis.群体结构与特征分析
PLoS Genet. 2006 Dec;2(12):e190. doi: 10.1371/journal.pgen.0020190.
9
Assessment of the role of genetic polymorphism in venous thrombosis through artificial neural networks.通过人工神经网络评估基因多态性在静脉血栓形成中的作用。
Ann Hum Genet. 2005 Nov;69(Pt 6):693-706. doi: 10.1111/j.1529-8817.2005.00206.x.
10
Neural network analysis in pharmacogenetics of mood disorders.情绪障碍药物遗传学中的神经网络分析。
BMC Med Genet. 2004 Dec 9;5:27. doi: 10.1186/1471-2350-5-27.