• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

小样本的隐私保护独立性卡方检验

Privacy-preserving chi-squared test of independence for small samples.

作者信息

Sei Yuichi, Ohsuga Akihiko

机构信息

The University of Electro-Communications, Tokyo, Japan.

出版信息

BioData Min. 2021 Jan 22;14(1):6. doi: 10.1186/s13040-021-00238-x.

DOI:10.1186/s13040-021-00238-x
PMID:33482874
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7820106/
Abstract

BACKGROUND

The importance of privacy protection in analyses of personal data, such as genome-wide association studies (GWAS), has grown in recent years. GWAS focuses on identifying single-nucleotide polymorphisms (SNPs) associated with certain diseases such as cancer and diabetes, and the chi-squared (χ) hypothesis test of independence can be utilized for this identification. However, recent studies have shown that publishing the results of χ tests of SNPs or personal data could lead to privacy violations. Several studies have proposed anonymization methods for χ testing with ε-differential privacy, which is the cryptographic community's de facto privacy metric. However, existing methods can only be applied to 2×2 or 2×3 contingency tables, otherwise their accuracy is low for small numbers of samples. It is difficult to collect numerous high-sensitive samples in many cases such as COVID-19 analysis in its early propagation stage.

RESULTS

We propose a novel anonymization method (RandChiDist), which anonymizes χ testing for small samples. We prove that RandChiDist satisfies differential privacy. We also experimentally evaluate its analysis using synthetic datasets and real two genomic datasets. RandChiDist achieved the least number of Type II errors among existing and baseline methods that can control the ratio of Type I errors.

CONCLUSIONS

We propose a new differentially private method, named RandChiDist, for anonymizing χ values for an I×J contingency table with a small number of samples. The experimental results show that RandChiDist outperforms existing methods for small numbers of samples.

摘要

背景

近年来,在全基因组关联研究(GWAS)等个人数据分析中,隐私保护的重要性日益凸显。GWAS专注于识别与某些疾病(如癌症和糖尿病)相关的单核苷酸多态性(SNP),卡方(χ)独立性假设检验可用于此识别。然而,最近的研究表明,公布SNP或个人数据的χ检验结果可能导致隐私侵犯。多项研究提出了使用ε-差分隐私进行χ检验的匿名化方法,ε-差分隐私是密码学界事实上的隐私度量标准。然而,现有方法仅适用于2×2或2×3列联表,否则对于少量样本其准确性较低。在许多情况下,如COVID-19早期传播阶段的分析,很难收集到大量高敏感性样本。

结果

我们提出了一种新颖的匿名化方法(RandChiDist),该方法可对少量样本的χ检验进行匿名化处理。我们证明了RandChiDist满足差分隐私。我们还使用合成数据集和真实的两个基因组数据集对其分析进行了实验评估。在能够控制I类错误率的现有方法和基线方法中,RandChiDist的II类错误数量最少。

结论

我们提出了一种名为RandChiDist的新的差分隐私方法,用于对少量样本的I×J列联表的χ值进行匿名化处理。实验结果表明,对于少量样本,RandChiDist优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/3fbf0c11513b/13040_2021_238_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/99e24cf77481/13040_2021_238_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/f8ae2e9724fe/13040_2021_238_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/7a0a9b5ce53f/13040_2021_238_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/9e75f1c0835c/13040_2021_238_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/dd3c845e993d/13040_2021_238_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/14aff56c0028/13040_2021_238_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/ba2b2679d15e/13040_2021_238_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/2b18736407ab/13040_2021_238_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/3fbf0c11513b/13040_2021_238_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/99e24cf77481/13040_2021_238_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/f8ae2e9724fe/13040_2021_238_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/7a0a9b5ce53f/13040_2021_238_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/9e75f1c0835c/13040_2021_238_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/dd3c845e993d/13040_2021_238_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/14aff56c0028/13040_2021_238_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/ba2b2679d15e/13040_2021_238_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/2b18736407ab/13040_2021_238_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8a3/7821525/3fbf0c11513b/13040_2021_238_Fig9_HTML.jpg

相似文献

1
Privacy-preserving chi-squared test of independence for small samples.小样本的隐私保护独立性卡方检验
BioData Min. 2021 Jan 22;14(1):6. doi: 10.1186/s13040-021-00238-x.
2
Privacy-preserving Chi-squared testing for genome SNP databases.用于基因组SNP数据库的隐私保护卡方检验
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:3884-3889. doi: 10.1109/EMBC.2017.8037705.
3
More practical differentially private publication of key statistics in GWAS.全基因组关联研究中关键统计量的更实用的差分隐私发布。
Bioinform Adv. 2021 May 18;1(1):vbab004. doi: 10.1093/bioadv/vbab004. eCollection 2021.
4
Scalable privacy-preserving data sharing methodology for genome-wide association studies.用于全基因组关联研究的可扩展隐私保护数据共享方法
J Biomed Inform. 2014 Aug;50:133-41. doi: 10.1016/j.jbi.2014.01.008. Epub 2014 Feb 6.
5
Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values.医学微观数据的差分隐私发布:一种保护信息属性值的高效实用方法。
BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5.
6
Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化
BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.
7
Privacy preserving data anonymization of spontaneous ADE reporting system dataset.自发不良药物事件报告系统数据集的隐私保护数据匿名化
BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):58. doi: 10.1186/s12911-016-0293-4.
8
Privacy-Preserving Anonymity for Periodical Releases of Spontaneous Adverse Drug Event Reporting Data: Algorithm Development and Validation.自发不良药物事件报告数据定期发布的隐私保护匿名性:算法开发与验证
JMIR Med Inform. 2021 Oct 28;9(10):e28752. doi: 10.2196/28752.
9
An anonymization-based privacy-preserving data collection protocol for digital health data.基于匿名化的数字健康数据隐私保护数据收集协议。
Front Public Health. 2023 Mar 3;11:1125011. doi: 10.3389/fpubh.2023.1125011. eCollection 2023.
10
The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.质量成本:在信息损失最小化的情况下,对生物医学数据进行匿名化处理时实施泛化和抑制。
J Biomed Inform. 2015 Dec;58:37-48. doi: 10.1016/j.jbi.2015.09.007. Epub 2015 Sep 15.

引用本文的文献

1
Deep representation learning for clustering longitudinal survival data from electronic health records.用于对电子健康记录中的纵向生存数据进行聚类的深度表示学习
Nat Commun. 2025 Mar 14;16(1):2534. doi: 10.1038/s41467-025-56625-z.
2
Measuring the Candidates' Emotions in Political Debates Based on Facial Expression Recognition Techniques.基于面部表情识别技术测量政治辩论中候选人的情绪。
Front Psychol. 2022 May 9;13:785453. doi: 10.3389/fpsyg.2022.785453. eCollection 2022.
3
Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework.

本文引用的文献

1
Genome-Wide Search for SNP Interactions in GWAS Data: Algorithm, Feasibility, Replication Using Schizophrenia Datasets.全基因组关联研究数据中SNP相互作用的全基因组搜索:算法、可行性及使用精神分裂症数据集的重复验证
Front Genet. 2020 Aug 28;11:1003. doi: 10.3389/fgene.2020.01003. eCollection 2020.
2
Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts.利用多基因风险评分中变异的相关性来检测 GWAS 队列中的异质性。
PLoS Genet. 2020 Sep 21;16(9):e1009015. doi: 10.1371/journal.pgen.1009015. eCollection 2020 Sep.
3
Current State of Evidence: Influence of Nutritional and Nutrigenetic Factors on Immunity in the COVID-19 Pandemic Framework.
挖掘初级保健电子健康记录以实现自动疾病表型分析:一个透明的机器学习框架。
Diagnostics (Basel). 2021 Oct 15;11(10):1908. doi: 10.3390/diagnostics11101908.
现状证据:营养和营养遗传学因素对 COVID-19 大流行框架下免疫的影响。
Nutrients. 2020 Sep 8;12(9):2738. doi: 10.3390/nu12092738.
4
Dynamic linkage of COVID-19 test results between Public Health England's Second Generation Surveillance System and UK Biobank.英格兰公共卫生局第二代监测系统与英国生物样本库之间的 COVID-19 检测结果动态链接。
Microb Genom. 2020 Jul;6(7). doi: 10.1099/mgen.0.000397.
5
and variants and expression as candidates to sex and country differences in COVID-19 severity in Italy.以及作为意大利COVID-19严重程度的性别和国家差异候选因素的变异和表达。
Aging (Albany NY). 2020 Jun 5;12(11):10087-10098. doi: 10.18632/aging.103415.
6
Analysis of GWAS-Derived Schizophrenia Genes for Links to Ischemia-Hypoxia Response of the Brain.全基因组关联研究衍生的精神分裂症基因与大脑缺血缺氧反应的关联分析
Front Psychiatry. 2020 May 12;11:393. doi: 10.3389/fpsyt.2020.00393. eCollection 2020.
7
Challenges in the Practice of Sexual Medicine in the Time of COVID-19 in the United Kingdom.英国 COVID-19 大流行时期性医学实践面临的挑战。
J Sex Med. 2020 Jul;17(7):1229-1236. doi: 10.1016/j.jsxm.2020.05.001. Epub 2020 May 14.
8
Acute Pulmonary Embolism and COVID-19.急性肺栓塞与 COVID-19。
Radiology. 2020 Dec;297(3):E335-E338. doi: 10.1148/radiol.2020201955. Epub 2020 May 14.
9
GWAS of mosaic loss of chromosome Y highlights genetic effects on blood cell differentiation.GWAS 分析发现染色体 Y 镶嵌性丢失与血细胞分化的遗传效应有关。
Nat Commun. 2019 Oct 17;10(1):4719. doi: 10.1038/s41467-019-12705-5.
10
Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database.估算罕见病的累计点患病率:对孤儿药数据库的分析。
Eur J Hum Genet. 2020 Feb;28(2):165-173. doi: 10.1038/s41431-019-0508-0. Epub 2019 Sep 16.