• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因表达数据与临床化学和病理评估的同时聚类揭示了表型原型。

Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes.

作者信息

Bushel Pierre R, Wolfinger Russell D, Gibson Greg

机构信息

National Center for Toxicogenomics, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.

出版信息

BMC Syst Biol. 2007 Feb 23;1:15. doi: 10.1186/1752-0509-1-15.

DOI:10.1186/1752-0509-1-15
PMID:17408499
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1839893/
Abstract

BACKGROUND

Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology.

RESULTS

We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis.

CONCLUSION

The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable.

摘要

背景

常用于分析基因表达数据的聚类方法并未直接纳入样本的表型数据。此外,对具有已知表型的样本进行聚类通常采用非正式的方式。聚类算法在分组过程中无法纳入生物学数据,这可能会限制对数据及其潜在生物学特性的正确解读。

结果

我们提出了一种更正式的方法——modk-原型算法,用于基于同时考虑微阵列基因表达数据和已知表型变量类别(如临床化学评估和组织病理学观察)对生物样本进行聚类。该策略涉及构建一个目标函数,其中数值微阵列和临床化学数据采用欧几里得距离平方和,组织病理学分类值采用简单匹配,以衡量样本的差异。微阵列、临床化学和组织病理学测量使用单独的加权项来控制每个数据域对样本聚类的影响。数值数据的动态有效性指标通过类别效用度量进行修改,以确定数据集中的聚类数量。一个聚类的原型由该组中所有样本数值特征的均值和分类值的众数组成,代表了聚类成员的表型。该方法在一个模拟混合数据集以及两个包含数值和分类数据类型的真实数据示例中表现良好。一个来自心脏病研究,另一个来自大鼠肝脏对乙酰氨基酚(一种镇痛药)暴露导致的小叶中心坏死。

结论

modk-原型算法将模拟数据分成了各自类别组中的聚类,心脏病样本分成了两组(患病组和健康组,分别表示具有心绞痛和非心绞痛代表性疼痛类型的样本),准确率为79%。这与几种知名且成功的聚类算法对心脏病样本的分类准确率相当,甚至更高。在对乙酰氨基酚暴露样本进行modk-原型聚类后,从聚类原型中鉴定出了信息基因,这些基因描述了大鼠肝脏小叶中心区域的坏死水平,并在表型上与之相关。细胞生长和/或维持、胺代谢以及应激反应等生物学过程被证明能够区分对乙酰氨基酚诱导的小叶中心坏死的无和中度水平。在聚类中直接使用知名的传统测量方法,为所得聚类能够有意义地解释提供了一定保证。

相似文献

1
Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes.基因表达数据与临床化学和病理评估的同时聚类揭示了表型原型。
BMC Syst Biol. 2007 Feb 23;1:15. doi: 10.1186/1752-0509-1-15.
2
Clustering of gene expression data and end-point measurements by simulated annealing.通过模拟退火对基因表达数据和终点测量进行聚类分析。
J Bioinform Comput Biol. 2009 Feb;7(1):193-215. doi: 10.1142/s021972000900400x.
3
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
4
Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.基于基因集富集分析的 clear cell 肾细胞癌基因表达分析用于生物统计学管理。
BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.
5
Multi-class clustering and prediction in the analysis of microarray data.微阵列数据分析中的多类聚类与预测
Math Biosci. 2005 Jan;193(1):79-100. doi: 10.1016/j.mbs.2004.07.002. Epub 2004 Dec 28.
6
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.
7
Iterative class discovery and feature selection using Minimal Spanning Trees.使用最小生成树的迭代类发现和特征选择
BMC Bioinformatics. 2004 Sep 8;5:126. doi: 10.1186/1471-2105-5-126.
8
Computing the maximum similarity bi-clusters of gene expression data.计算基因表达数据的最大相似性双聚类
Bioinformatics. 2007 Jan 1;23(1):50-6. doi: 10.1093/bioinformatics/btl560. Epub 2006 Nov 7.
9
Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类
Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.
10
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.用于评估DNA微阵列数据分析中患者聚类可靠性的随机图谱。
Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.

引用本文的文献

1
Extracellular vesicles and microRNAs are altered in response to exercise, insulin sensitivity and overweight.细胞外囊泡和 microRNAs 在响应运动、胰岛素敏感性和超重时会发生改变。
Acta Physiol (Oxf). 2022 Dec;236(4):e13862. doi: 10.1111/apha.13862. Epub 2022 Aug 10.
2
Ridge Penalization in High-Dimensional Testing With Applications to Imaging Genetics.高维检验中的岭惩罚及其在影像遗传学中的应用
Front Neurosci. 2022 Mar 24;16:836100. doi: 10.3389/fnins.2022.836100. eCollection 2022.
3
On the Use of Correlation and MI as a Measure of Metabolite-Metabolite Association for Network Differential Connectivity Analysis.

本文引用的文献

1
Heritable clustering and pathway discovery in breast cancer integrating epigenetic and phenotypic data.整合表观遗传和表型数据的乳腺癌遗传聚类与通路发现
BMC Bioinformatics. 2007 Feb 1;8:38. doi: 10.1186/1471-2105-8-38.
2
Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks.通过贝叶斯网络整合临床和微阵列数据预测乳腺癌的预后。
Bioinformatics. 2006 Jul 15;22(14):e184-90. doi: 10.1093/bioinformatics/btl230.
3
Integrating time-course microarray gene expression profiles with cytotoxicity for identification of biomarkers in primary rat hepatocytes exposed to cadmium.
关于使用相关性和互信息作为代谢物-代谢物关联度量进行网络差异连通性分析
Metabolites. 2020 Apr 24;10(4):171. doi: 10.3390/metabo10040171.
4
Plasma Sulphur-Containing Amino Acids, Physical Exercise and Insulin Sensitivity in Overweight Dysglycemic and Normal Weight Normoglycemic Men.血浆含硫氨基酸、体力活动与超重糖调节受损及正常体重血糖正常男性的胰岛素敏感性。
Nutrients. 2018 Dec 20;11(1):10. doi: 10.3390/nu11010010.
5
Verification of Three-Phase Dependency Analysis Bayesian Network Learning Method for Maize Carotenoid Gene Mining.用于玉米类胡萝卜素基因挖掘的三相依赖分析贝叶斯网络学习方法的验证
Biomed Res Int. 2017;2017:1813494. doi: 10.1155/2017/1813494. Epub 2017 Jul 30.
6
Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.多组学数据整合中缺失行的处理:多因素分析框架下的多重填补
BMC Bioinformatics. 2016 Oct 3;17(1):402. doi: 10.1186/s12859-016-1273-5.
7
Systems Approach to Identifying Relevant Pathways from Phenotype Information in Dose-Dependent Time Series Microarray Data.从剂量依赖性时间序列微阵列数据中的表型信息识别相关通路的系统方法。
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2011 Nov;2011:290-293. doi: 10.1109/BIBM.2011.76.
8
Hierarchical expression of genes controlled by the Bacillus subtilis global regulatory protein CodY.枯草芽孢杆菌全局调控蛋白 CodY 控制的基因的层次表达。
Proc Natl Acad Sci U S A. 2014 Jun 3;111(22):8227-32. doi: 10.1073/pnas.1321308111. Epub 2014 May 19.
9
Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes.基于决策树的方法,用于整合基因表达、人口统计学和临床数据以确定疾病内型。
BMC Syst Biol. 2013 Nov 4;7:119. doi: 10.1186/1752-0509-7-119.
10
Visualising associations between paired 'omics' data sets.可视化配对的“组学”数据集之间的关联。
BioData Min. 2012 Nov 13;5(1):19. doi: 10.1186/1756-0381-5-19.
整合时间进程微阵列基因表达谱与细胞毒性,以鉴定暴露于镉的原代大鼠肝细胞中的生物标志物。
Bioinformatics. 2006 Jan 1;22(1):77-87. doi: 10.1093/bioinformatics/bti737. Epub 2005 Oct 25.
4
Transcriptional profiling of the left and median liver lobes of male f344/n rats following exposure to acetaminophen.对雄性F344/n大鼠暴露于对乙酰氨基酚后左肝叶和中叶进行转录谱分析。
Toxicol Pathol. 2005;33(1):111-7. doi: 10.1080/01926230590522257.
5
Clustering of diverse genomic data using information fusion.利用信息融合对多样的基因组数据进行聚类分析。
Bioinformatics. 2005 Feb 15;21(4):423-9. doi: 10.1093/bioinformatics/bti186. Epub 2004 Dec 17.
6
Phenotypic anchoring of gene expression changes during estrogen-induced uterine growth.雌激素诱导子宫生长过程中基因表达变化的表型锚定
Environ Health Perspect. 2004 Nov;112(16):1589-606. doi: 10.1289/txg.7345.
7
Toxicogenomics and systems toxicology: aims and prospects.毒理基因组学与系统毒理学:目标与展望
Nat Rev Genet. 2004 Dec;5(12):936-48. doi: 10.1038/nrg1493.
8
Integration of clinical data, pathology, and cDNA microarrays in influenza virus-infected pigtailed macaques (Macaca nemestrina).临床数据、病理学与cDNA微阵列在感染流感病毒的豚尾猕猴(食蟹猕猴)中的整合
J Virol. 2004 Oct;78(19):10420-32. doi: 10.1128/JVI.78.19.10420-10432.2004.
9
Constrained clusters of gene expression profiles with pathological features.具有病理特征的基因表达谱的受限聚类
Bioinformatics. 2004 Nov 22;20(17):3137-45. doi: 10.1093/bioinformatics/bth373. Epub 2004 Jun 24.
10
Gene expression profiling of rat livers reveals indicators of potential adverse effects.大鼠肝脏的基因表达谱揭示了潜在不良反应的指标。
Toxicol Sci. 2004 Jul;80(1):193-202. doi: 10.1093/toxsci/kfh145. Epub 2004 Apr 14.