• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CLARITY:使用差异度比较异构数据。

CLARITY: comparing heterogeneous data using dissimilarity.

作者信息

Lawson Daniel J, Solanki Vinesh, Yanovich Igor, Dellert Johannes, Ruck Damian, Endicott Phillip

机构信息

Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK.

Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK.

出版信息

R Soc Open Sci. 2021 Dec 8;8(12):202182. doi: 10.1098/rsos.202182. eCollection 2021 Dec.

DOI:10.1098/rsos.202182
PMID:34909208
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8652278/
Abstract

Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation versus expression, evolution of language sounds versus word use, and country-level economic metrics versus cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a 'structural' component analogous to a clustering, and an underlying 'relationship' between those structures. This allows a 'structural comparison' between two similarity matrices using their predictability from 'structure'. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from github.com/danjlawson/CLARITY.

摘要

整合来自不同学科的数据集并非易事,因为这些数据在含义、规模和可靠性方面往往存在质的差异。当两个数据集描述的是相同的实体时,许多科学问题可以围绕这些实体之间的(不)相似性在如此不同的数据中是否保持一致来提出。我们的方法CLARITY可以量化数据集之间的一致性,识别不一致出现的位置并帮助解释这些不一致。我们通过三个不同的比较来说明这一点:基因甲基化与基因表达、语音演变与词汇使用,以及国家层面的经济指标与文化信仰。这种非参数方法对噪声和尺度差异具有鲁棒性,并且对数据的生成方式仅做了较弱的假设。它通过将相似性分解为两个部分来运作:一个类似于聚类的“结构”部分,以及这些结构之间潜在的“关系”。这使得可以使用两个相似性矩阵从“结构”中的可预测性进行“结构比较”。借助针对每个数据集的重采样来评估显著性。软件CLARITY可作为R包从github.com/danjlawson/CLARITY获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/489b03ec21e0/rsos202182f08.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/c90d79379bdc/rsos202182f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/9059d987e128/rsos202182f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/93410e9b79e5/rsos202182f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/718e08569df9/rsos202182f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/3e7c4a2f6029/rsos202182f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/8f16fbc7afc3/rsos202182f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/b4fccdae3d31/rsos202182f07.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/489b03ec21e0/rsos202182f08.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/c90d79379bdc/rsos202182f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/9059d987e128/rsos202182f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/93410e9b79e5/rsos202182f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/718e08569df9/rsos202182f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/3e7c4a2f6029/rsos202182f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/8f16fbc7afc3/rsos202182f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/b4fccdae3d31/rsos202182f07.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb27/8652278/489b03ec21e0/rsos202182f08.jpg

相似文献

1
CLARITY: comparing heterogeneous data using dissimilarity.CLARITY:使用差异度比较异构数据。
R Soc Open Sci. 2021 Dec 8;8(12):202182. doi: 10.1098/rsos.202182. eCollection 2021 Dec.
2
Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering.在依赖结构的弱条件下比较大型协方差矩阵及其在基因聚类中的应用。
Biometrics. 2017 Mar;73(1):31-41. doi: 10.1111/biom.12552. Epub 2016 Jul 5.
3
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
4
Ordinal Characterization of Similarity Judgments.相似性判断的顺序特征
ArXiv. 2025 Feb 12:arXiv:2310.07543v3.
5
Visualization of conserved structures by fusing highly variable datasets.通过融合高度可变的数据集来可视化保守结构。
Stud Health Technol Inform. 2002;85:494-500.
6
A cross-species bi-clustering approach to identifying conserved co-regulated genes.一种用于识别保守共调控基因的跨物种双聚类方法。
Bioinformatics. 2016 Jun 15;32(12):i137-i146. doi: 10.1093/bioinformatics/btw278.
7
A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort.贝叶斯双向潜在结构模型用于基因组数据整合,揭示乳腺癌队列中很少有泛基因组聚类亚型。
Bioinformatics. 2019 Dec 1;35(23):4886-4897. doi: 10.1093/bioinformatics/btz381.
8
BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets.BABAR:一个 R 包,用于简化常见参考设计微阵列转录组数据集的标准化。
BMC Bioinformatics. 2010 Feb 3;11:73. doi: 10.1186/1471-2105-11-73.
9
Clusterdv: a simple density-based clustering method that is robust, general and automatic.Clusterdv:一种简单的基于密度的聚类方法,具有鲁棒性、通用性和自动化特点。
Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932.
10
CLINE: a web-tool for the comparison of biological dendrogram structures.CLINE:一个用于比较生物树状结构图结构的网络工具。
BMC Bioinformatics. 2019 Oct 28;20(1):528. doi: 10.1186/s12859-019-3149-y.

本文引用的文献

1
Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation.遗传因素对 DNA 甲基化影响图谱的基因组和表型分析
Nat Genet. 2021 Sep;53(9):1311-1321. doi: 10.1038/s41588-021-00923-x. Epub 2021 Sep 6.
2
NorthEuraLex: a wide-coverage lexical database of Northern Eurasia.北欧语言词汇库:一个覆盖范围广泛的北欧亚词汇数据库。
Lang Resour Eval. 2020;54(1):273-301. doi: 10.1007/s10579-019-09480-6. Epub 2019 Nov 30.
3
A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots.关于如何不过度解读 STRUCTURE 和 ADMIXTURE 条形图的教程。
Nat Commun. 2018 Aug 14;9(1):3258. doi: 10.1038/s41467-018-05257-7.
4
Religious change preceded economic change in the 20th century.20 世纪,宗教变革先于经济变革。
Sci Adv. 2018 Jul 18;4(7):eaar8680. doi: 10.1126/sciadv.aar8680. eCollection 2018 Jul.
5
Complete mitochondrial and rDNA complex sequences of important vector species of Biomphalaria, obligatory hosts of the human-infecting blood fluke, Schistosoma mansoni.重要的布氏姜片吸虫(曼氏血吸虫的人体感染性血吸)中间宿主——淡水螺(鲍氏)属的完整线粒体和 rDNA 复合物序列。
Sci Rep. 2018 May 9;8(1):7341. doi: 10.1038/s41598-018-25463-z.
6
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.主成分分析与系统发育树空间中弗雷歇均值的轨迹
Biometrika. 2017 Dec;104(4):901-922. doi: 10.1093/biomet/asx047. Epub 2017 Sep 27.
7
How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test.多元数据集的匹配程度如何?与曼特尔检验相比,正交旋转重叠法的优势。
Oecologia. 2001 Oct;129(2):169-178. doi: 10.1007/s004420100720. Epub 2001 Oct 1.
8
Complex heatmaps reveal patterns and correlations in multidimensional genomic data.复杂热图揭示多维基因组数据中的模式和相关性。
Bioinformatics. 2016 Sep 15;32(18):2847-9. doi: 10.1093/bioinformatics/btw313. Epub 2016 May 20.
9
Shared Cultural History as a Predictor of Political and Economic Changes among Nation States.共享文化历史作为民族国家间政治和经济变化的一个预测因素。
PLoS One. 2016 Apr 25;11(4):e0152979. doi: 10.1371/journal.pone.0152979. eCollection 2016.
10
A comparison of worldwide phonemic and genetic variation in human populations.全球人类群体中语音和基因变异的比较。
Proc Natl Acad Sci U S A. 2015 Feb 3;112(5):1265-72. doi: 10.1073/pnas.1424033112. Epub 2015 Jan 20.