• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

突出群体遗传学数据集中的非线性模式。

Highlighting nonlinear patterns in population genetics datasets.

作者信息

Alanis-Lobato Gregorio, Cannistraci Carlo Vittorio, Eriksson Anders, Manica Andrea, Ravasi Timothy

机构信息

1] Integrative Systems Biology Laboratory, Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Ibn Al Haytham Bldg. 2, Level 4, Thuwal 23955-6900, Kingdom of Saudi Arabia [2] Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA.

Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.

出版信息

Sci Rep. 2015 Jan 30;5:8140. doi: 10.1038/srep08140.

DOI:10.1038/srep08140
PMID:25633916
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4311249/
Abstract

Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.

摘要

在群体遗传学和病例对照研究中检测结构很重要,因为它能揭示生态渐变群、混合和分层等现象。主成分分析(PCA)是一种常用于此目的的线性降维技术,但它难以揭示复杂的非线性数据模式。在本文中,我们介绍了非中心最小曲线嵌入(ncMCE),这是一种克服此问题的非线性方法。我们的分析表明,在PCA无法揭示任何清晰结构的情况下,ncMCE可以将个体分为不同种族群体。这种更强的区分能力源于ncMCE能够更好地捕捉样本中的系统发育信号,而PCA则更好地反映它们的地理关系。我们还展示了即使数据预处理不佳,ncMCE也能发现有趣的模式。PCA和ncMCE可视化的并列提供了一种新的分析标准,有助于发现和验证遗传数据中显著的线性/非线性互补模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/ceccab0a47ad/srep08140-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/96fbe43009d0/srep08140-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/8b7051d9bf50/srep08140-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/5fe27ee611e2/srep08140-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/2d6a09cee179/srep08140-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/bf2a1f77a3b9/srep08140-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/ceccab0a47ad/srep08140-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/96fbe43009d0/srep08140-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/8b7051d9bf50/srep08140-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/5fe27ee611e2/srep08140-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/2d6a09cee179/srep08140-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/bf2a1f77a3b9/srep08140-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7e5/4311249/ceccab0a47ad/srep08140-f6.jpg

相似文献

1
Highlighting nonlinear patterns in population genetics datasets.突出群体遗传学数据集中的非线性模式。
Sci Rep. 2015 Jan 30;5:8140. doi: 10.1038/srep08140.
2
Evaluation of methods for adjusting population stratification in genome-wide association studies: Standard versus categorical principal component analysis.全基因组关联研究中调整群体分层方法的评估:标准主成分分析与分类主成分分析
Ann Hum Genet. 2019 Nov;83(6):454-464. doi: 10.1111/ahg.12339. Epub 2019 Jul 19.
3
Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples.基因表达数据的非线性维数降低,用于癌症组织样本的可视化和聚类分析。
Comput Biol Med. 2010 Aug;40(8):723-32. doi: 10.1016/j.compbiomed.2010.06.007. Epub 2010 Jul 16.
4
Inferring the population structure and admixture history of three Hmong-Mien-speaking Miao tribes from southwest China based on genome-wide SNP genotyping.基于全基因组单核苷酸多态性基因分型推断中国西南部三个讲苗瑶语的苗族部落的群体结构和混合历史。
Ann Hum Biol. 2021 Aug;48(5):418-429. doi: 10.1080/03014460.2021.2005825.
5
Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure.结合迭代修剪主成分分析和结构对大型高度分层人群数据集进行研究。
BMC Bioinformatics. 2011 Jun 23;12:255. doi: 10.1186/1471-2105-12-255.
6
Tracing sub-structure in the European American population with PCA-informative markers.使用主成分分析(PCA)信息性标记物追踪欧裔美国人种群中的亚结构。
PLoS Genet. 2008 Jul 4;4(7):e1000114. doi: 10.1371/journal.pgen.1000114.
7
Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data.基于最小曲率的非线性降维用于多维蛋白质组学数据模式的无监督发现
Methods Mol Biol. 2016;1384:289-98. doi: 10.1007/978-1-4939-3255-9_16.
8
A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis.一种基于混合模型和主成分分析的群体结构似然无估计方法。
Genetics. 2019 Aug;212(4):1009-1029. doi: 10.1534/genetics.119.302159. Epub 2019 Apr 26.
9
Complex-valued neural networks for nonlinear complex principal component analysis.用于非线性复主成分分析的复值神经网络。
Neural Netw. 2005 Jan;18(1):61-9. doi: 10.1016/j.neunet.2004.08.002.
10
KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis.KLFDAPC:一种用于空间遗传结构分析的有监督机器学习方法。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac202.

引用本文的文献

1
Accurate Identification of Native Asian Honey Bee Populations in Jilong (Xizang, China) by Population Genomics and Deep Learning.通过群体基因组学和深度学习准确识别中国西藏吉隆的亚洲本土蜜蜂种群
Insects. 2025 Jul 31;16(8):788. doi: 10.3390/insects16080788.
2
An Improved Kernel Entropy Component Analysis for Damage Detection Under Environmental and Operational Variations.一种用于环境和运行变化下损伤检测的改进核熵成分分析
Sensors (Basel). 2025 Feb 21;25(5):1332. doi: 10.3390/s25051332.
3
Simplicity within biological complexity.生物复杂性中的简单性。

本文引用的文献

1
Specificity and transcriptional activity of microbiota associated with low and high microbial abundance sponges from the Red Sea.来自红海的低微生物丰度和高微生物丰度海绵相关微生物群的特异性和转录活性。
Mol Ecol. 2014 Mar;23(6):1348-1363. doi: 10.1111/mec.12365. Epub 2013 Aug 20.
2
Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding.最小曲率增强网络嵌入的蛋白质相互作用拓扑预测。
Bioinformatics. 2013 Jul 1;29(13):i199-209. doi: 10.1093/bioinformatics/btt208.
3
The evolution of ultraconserved elements with different phylogenetic origins.
Bioinform Adv. 2025 Feb 6;5(1):vbae164. doi: 10.1093/bioadv/vbae164. eCollection 2025.
4
Computing linkage disequilibrium aware genome embeddings using autoencoders.使用自动编码器计算连锁不平衡感知的基因组嵌入。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae326.
5
Integration of pan-cancer multi-omics data for novel mixed subgroup identification using machine learning methods.基于机器学习方法的泛癌多组学生物标志物数据整合,用于新型混合亚组鉴定。
PLoS One. 2023 Oct 19;18(10):e0287176. doi: 10.1371/journal.pone.0287176. eCollection 2023.
6
Machine learning based combination of multi-omics data for subgroup identification in non-small cell lung cancer.基于机器学习的多组学生物标志物数据融合分析用于非小细胞肺癌亚组鉴定。
Sci Rep. 2023 Mar 21;13(1):4636. doi: 10.1038/s41598-023-31426-w.
7
KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis.KLFDAPC:一种用于空间遗传结构分析的有监督机器学习方法。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac202.
8
A deep learning framework for characterization of genotype data.深度学习框架用于基因型数据的特征描述。
G3 (Bethesda). 2022 Mar 4;12(3). doi: 10.1093/g3journal/jkac020.
9
Nonlinear machine learning pattern recognition and bacteria-metabolite multilayer network analysis of perturbed gastric microbiome.非线性机器学习模式识别及扰动胃微生物组的细菌代谢物多层网络分析。
Nat Commun. 2021 Mar 26;12(1):1926. doi: 10.1038/s41467-021-22135-x.
10
Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure.基于模型的多因素降维方法在通过控制群体结构进行上位性检测中的性能
BioData Min. 2021 Feb 19;14(1):16. doi: 10.1186/s13040-021-00247-w.
具有不同进化起源的超保守元件的进化。
BMC Evol Biol. 2012 Dec 5;12:236. doi: 10.1186/1471-2148-12-236.
4
Five years of GWAS discovery.GWAS 发现的五年。
Am J Hum Genet. 2012 Jan 13;90(1):7-24. doi: 10.1016/j.ajhg.2011.11.029.
5
Tree preserving embedding.树保护嵌入。
Proc Natl Acad Sci U S A. 2011 Oct 11;108(41):16916-21. doi: 10.1073/pnas.1018393108. Epub 2011 Sep 26.
6
Stage prediction of embryonic stem cell differentiation from genome-wide expression data.基于全基因组表达数据的胚胎干细胞分化阶段预测。
Bioinformatics. 2011 Sep 15;27(18):2546-53. doi: 10.1093/bioinformatics/btr422. Epub 2011 Jul 15.
7
PanSNPdb: the Pan-Asian SNP genotyping database.PanSNPdb:泛亚洲 SNP 基因分型数据库。
PLoS One. 2011;6(6):e21451. doi: 10.1371/journal.pone.0021451. Epub 2011 Jun 23.
8
Population genetic structure of peninsular Malaysia Malay sub-ethnic groups.马来半岛马来亚族群的人口遗传结构。
PLoS One. 2011 Apr 5;6(4):e18312. doi: 10.1371/journal.pone.0018312.
9
Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes.通过最小曲率展开,非线性维数降低和聚类揭示出神经性疼痛和组织胚胎类。
Bioinformatics. 2010 Sep 15;26(18):i531-9. doi: 10.1093/bioinformatics/btq376.
10
Mapping human genetic diversity in Asia.绘制亚洲人类遗传多样性图谱。
Science. 2009 Dec 11;326(5959):1541-5. doi: 10.1126/science.1177074.