剖析性状异质性：应用于基因型数据的三种聚类方法的比较

Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data.

作者信息

Thornton-Wells Tricia A, Moore Jason H, Haines Jonathan L

机构信息

Neuroscience Graduate Program, Vanderbilt Brain Institute, Vanderbilt University Medical Center, Nashville, TN, USA.

出版信息

BMC Bioinformatics. 2006 Apr 12;7:204. doi: 10.1186/1471-2105-7-204.

DOI:10.1186/1471-2105-7-204

PMID:16611359

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1525209/

Abstract

BACKGROUND

Trait heterogeneity, which exists when a trait has been defined with insufficient specificity such that it is actually two or more distinct traits, has been implicated as a confounding factor in traditional statistical genetics of complex human disease. In the absence of detailed phenotypic data collected consistently in combination with genetic data, unsupervised computational methodologies offer the potential for discovering underlying trait heterogeneity. The performance of three such methods--Bayesian Classification, Hypergraph-Based Clustering, and Fuzzy k-Modes Clustering--appropriate for categorical data were compared. Also tested was the ability of these methods to detect trait heterogeneity in the presence of locus heterogeneity and/or gene-gene interaction, which are two other complicating factors in discovering genetic models of complex human disease. To determine the efficacy of applying the Bayesian Classification method to real data, the reliability of its internal clustering metrics at finding good clusterings was evaluated using permutation testing.

RESULTS

Bayesian Classification outperformed the other two methods, with the exception that the Fuzzy k-Modes Clustering performed best on the most complex genetic model. Bayesian Classification achieved excellent recovery for 75% of the datasets simulated under the simplest genetic model, while it achieved moderate recovery for 56% of datasets with a sample size of 500 or more (across all simulated models) and for 86% of datasets with 10 or fewer nonfunctional loci (across all simulated models). Neither Hypergraph Clustering nor Fuzzy k-Modes Clustering achieved good or excellent cluster recovery for a majority of datasets even under a restricted set of conditions. When using the average log of class strength as the internal clustering metric, the false positive rate was controlled very well, at three percent or less for all three significance levels (0.01, 0.05, 0.10), and the false negative rate was acceptably low (18 percent) for the least stringent significance level of 0.10.

CONCLUSION

Bayesian Classification shows promise as an unsupervised computational method for dissecting trait heterogeneity in genotypic data. Its control of false positive and false negative rates lends confidence to the validity of its results. Further investigation of how different parameter settings may improve the performance of Bayesian Classification, especially under more complex genetic models, is ongoing.

摘要

背景

当一个性状的定义特异性不足以至于实际上是两个或更多不同的性状时，就会出现性状异质性，它被认为是复杂人类疾病传统统计遗传学中的一个混杂因素。在缺乏与遗传数据一致收集的详细表型数据的情况下，无监督计算方法为发现潜在的性状异质性提供了可能性。比较了三种适用于分类数据的此类方法——贝叶斯分类、基于超图的聚类和模糊k-模式聚类的性能。还测试了这些方法在存在位点异质性和/或基因-基因相互作用的情况下检测性状异质性的能力，这是发现复杂人类疾病遗传模型的另外两个复杂因素。为了确定将贝叶斯分类方法应用于实际数据的有效性，使用置换检验评估了其内部聚类指标在找到良好聚类方面的可靠性。

结果

贝叶斯分类的表现优于其他两种方法，但在最复杂的遗传模型上，模糊k-模式聚类表现最佳。对于在最简单遗传模型下模拟的75%的数据集，贝叶斯分类实现了出色的恢复，而对于样本量为500或更多（在所有模拟模型中）的56%的数据集以及对于10个或更少非功能性位点的86%的数据集（在所有模拟模型中），它实现了中等程度的恢复。即使在一组受限条件下，超图聚类和模糊k-模式聚类对于大多数数据集都没有实现良好或出色的聚类恢复。当使用类强度的平均对数作为内部聚类指标时，误报率得到了很好的控制，在所有三个显著性水平（0.01、0.05、0.10）下均为3%或更低，对于最宽松的显著性水平0.10，漏报率也低至可以接受的18%。

结论

贝叶斯分类作为一种用于剖析基因型数据中性状异质性的无监督计算方法显示出前景。其对误报率和漏报率的控制为其结果的有效性提供了信心。目前正在进一步研究不同的参数设置如何可能提高贝叶斯分类的性能，特别是在更复杂的遗传模型下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a73/1525209/122ffff6d616/1471-2105-7-204-1.jpg

相似文献

Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data.剖析性状异质性：应用于基因型数据的三种聚类方法的比较

BMC Bioinformatics. 2006 Apr 12;7:204. doi: 10.1186/1471-2105-7-204.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Effect of data normalization on fuzzy clustering of DNA microarray data.数据归一化对DNA微阵列数据模糊聚类的影响。

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?基于汇总数据的方法在不同遗传结构下识别表达性状关联的能力有多强？

Pac Symp Biocomput. 2018;23:228-239.

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

Bayesian mixture model based clustering of replicated microarray data.基于贝叶斯混合模型的重复微阵列数据聚类

Bioinformatics. 2004 May 22;20(8):1222-32. doi: 10.1093/bioinformatics/bth068. Epub 2004 Feb 10.

Haplotype-based quantitative trait mapping using a clustering algorithm.使用聚类算法的基于单倍型的数量性状定位

BMC Bioinformatics. 2006 May 18;7:258. doi: 10.1186/1471-2105-7-258.

Bayesian clustering of fuzzy feature vectors using a quasi-likelihood approach.使用拟似然方法对模糊特征向量进行贝叶斯聚类。

IEEE Trans Pattern Anal Mach Intell. 2009 Jan;31(1):74-85. doi: 10.1109/TPAMI.2008.53.

Bayesian estimation of genetic parameters for multivariate threshold and continuous phenotypes and molecular genetic data in simulated horse populations using Gibbs sampling.使用吉布斯抽样对模拟马种群中的多变量阈值和连续表型以及分子遗传数据进行遗传参数的贝叶斯估计。

BMC Genet. 2007 May 9;8:19. doi: 10.1186/1471-2156-8-19.

clusterBMA: Bayesian model averaging for clustering.聚类 BMA：用于聚类的贝叶斯模型平均。

PLoS One. 2023 Aug 21;18(8):e0288000. doi: 10.1371/journal.pone.0288000. eCollection 2023.

引用本文的文献

Dissection of tumoral niches using spatial transcriptomics and deep learning.利用空间转录组学和深度学习剖析肿瘤微环境

iScience. 2025 Mar 13;28(4):112214. doi: 10.1016/j.isci.2025.112214. eCollection 2025 Apr 18.

A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits.一种可扩展的自适应二次核方法，用于复杂性状中的可解释的上位性分析。

Genome Res. 2024 Oct 11;34(9):1294-1303. doi: 10.1101/gr.279140.124.

Genetic heterogeneity: Challenges, impacts, and methods through an associative lens.遗传异质性：关联视角下的挑战、影响与方法。

Genet Epidemiol. 2022 Dec;46(8):555-571. doi: 10.1002/gepi.22497. Epub 2022 Aug 4.

Benchmarking relief-based feature selection methods for bioinformatics data mining.基于基准的特征选择方法在生物信息学数据挖掘中的应用。

J Biomed Inform. 2018 Sep;85:168-188. doi: 10.1016/j.jbi.2018.07.015. Epub 2018 Jul 17.

Informatics and machine learning to define the phenotype.信息学和机器学习定义表型。

Expert Rev Mol Diagn. 2018 Mar;18(3):219-226. doi: 10.1080/14737159.2018.1439380. Epub 2018 Feb 16.

Genetic Research and Women's Heart Disease: a Primer.遗传研究与女性心脏病：入门指南

Curr Atheroscler Rep. 2016 Nov;18(11):67. doi: 10.1007/s11883-016-0618-x.

Molecular reclassification of Crohn's disease: a cautionary note on population stratification.克罗恩病的分子重新分类：关于人群分层的警示。

PLoS One. 2013 Oct 17;8(10):e77720. doi: 10.1371/journal.pone.0077720. eCollection 2013.

Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach.遗传异质性和上位性在膀胱癌易感性和预后中的作用：学习分类器系统方法。

J Am Med Inform Assoc. 2013 Jul-Aug;20(4):603-12. doi: 10.1136/amiajnl-2012-001574. Epub 2013 Feb 26.

Assessing gene-gene interactions in pharmacogenomics.评估药物基因组学中的基因-基因相互作用。

Mol Diagn Ther. 2012 Feb 1;16(1):15-27. doi: 10.1007/BF03256426.

Association Rule Discovery Has the Ability to Model Complex Genetic Effects.关联规则发现有能力对复杂的遗传效应进行建模。

IEEE Symp Comput Intell Data Min. 2007 Mar 1;2007:624-629. doi: 10.1109/CIDM.2007.368934.

本文引用的文献

Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis.跨越生物学上位性与统计上位性之间的概念鸿沟：系统生物学与更现代的综合理论

Bioessays. 2005 Jun;27(6):637-46. doi: 10.1002/bies.20236.

A global view of epistasis.上位性的全局视角。

Nat Genet. 2005 Jan;37(1):13-4. doi: 10.1038/ng0105-13.

Computational analysis of gene-gene interactions using multifactor dimensionality reduction.使用多因素降维法对基因-基因相互作用进行计算分析。

Expert Rev Mol Diagn. 2004 Nov;4(6):795-803. doi: 10.1586/14737159.4.6.795.

Genetics, statistics and human disease: analytical retooling for complexity.遗传学、统计学与人类疾病：针对复杂性的分析工具重塑。

Trends Genet. 2004 Dec;20(12):640-7. doi: 10.1016/j.tig.2004.09.007.

Properties of the Hubert-Arabie adjusted Rand index.休伯特 - 阿拉比调整兰德指数的性质。

Psychol Methods. 2004 Sep;9(3):386-96. doi: 10.1037/1082-989X.9.3.386.

Ordered subset analysis in genetic linkage mapping of complex traits.复杂性状基因连锁图谱中的有序子集分析。

Genet Epidemiol. 2004 Jul;27(1):53-63. doi: 10.1002/gepi.20000.

Ideal discrimination of discrete clinical endpoints using multilocus genotypes.利用多位点基因型对离散临床终点进行理想的判别。

In Silico Biol. 2004;4(2):183-94.

Global mapping of the yeast genetic interaction network.酵母遗传相互作用网络的全球图谱。

Science. 2004 Feb 6;303(5659):808-13. doi: 10.1126/science.1091317.

The ubiquitous nature of epistasis in determining susceptibility to common human diseases.上位性在决定人类常见疾病易感性方面的普遍存在。

Hum Hered. 2003;56(1-3):73-82. doi: 10.1159/000073735.

TESTING FOR HETEROGENEITY OF RECOMBINATION FRACTION VALUES IN HUMAN GENETICS.人类遗传学中重组率值的异质性检验

Ann Hum Genet. 1963 Nov;27:175-82. doi: 10.1111/j.1469-1809.1963.tb00210.x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

剖析性状异质性：应用于基因型数据的三种聚类方法的比较

Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献