• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FANCY:功能基因组学数据中隐私风险的快速评估。

FANCY: fast estimation of privacy risk in functional genomics data.

机构信息

Computational Biology and Bioinformatics, New Haven, CT 06520, USA.

Molecular Biophysics and Biochemistry, New Haven, CT 06520, USA.

出版信息

Bioinformatics. 2021 Jan 29;36(21):5145-5150. doi: 10.1093/bioinformatics/btaa661.

DOI:10.1093/bioinformatics/btaa661
PMID:32726397
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7850135/
Abstract

MOTIVATION

Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release.

RESULTS

FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R2 for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low.

AVAILABILITY AND IMPLEMENTATION

A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

功能基因组学数据正变得具有临床可操作性,由此引发了隐私方面的担忧。然而,由于测序技术的异质性,通过基因分型来量化隐私泄露是很困难的。因此,我们提出了 FANCY,这是一种无需明确基因分型即可从原始 RNA-Seq、ATAC-Seq 和 ChIP-Seq 读取中快速估计泄露变体数量的工具。FANCY 使用整体测序统计数据作为特征的监督回归,在数据发布之前提供总体隐私风险的估计。

结果

FANCY 可以预测累积的泄露 SNV 数量,所有独立测试集的平均 0.95R2。我们意识到在泄露变体数量较少时准确预测的重要性。因此,我们开发了模型的特殊版本,当泄露变体数量较少时,可以进行更准确的预测。

可用性和实现

FANCY 的 Python 和 MATLAB 实现,以及生成特征的自定义脚本可在 https://github.com/gersteinlab/FANCY 上找到。我们还提供了 jupyter 笔记本,以便用户可以根据自己的数据优化回归模型中的参数。可以在 fancy.gersteinlab.org 上找到一个易于使用的输入和显示结果的网络服务器。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/67a9c4851000/btaa661f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/68629d859f27/btaa661f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/f0391bbc3078/btaa661f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/192c16654bca/btaa661f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/121d4429e60c/btaa661f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/a68085a75697/btaa661f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/406f8d062e6c/btaa661f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/edf00deaf7bc/btaa661f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/67a9c4851000/btaa661f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/68629d859f27/btaa661f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/f0391bbc3078/btaa661f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/192c16654bca/btaa661f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/121d4429e60c/btaa661f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/a68085a75697/btaa661f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/406f8d062e6c/btaa661f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/edf00deaf7bc/btaa661f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/67a9c4851000/btaa661f8.jpg

相似文献

1
FANCY: fast estimation of privacy risk in functional genomics data.FANCY:功能基因组学数据中隐私风险的快速评估。
Bioinformatics. 2021 Jan 29;36(21):5145-5150. doi: 10.1093/bioinformatics/btaa661.
2
snakePipes: facilitating flexible, scalable and integrative epigenomic analysis.snakePipes:实现灵活、可扩展和集成的表观基因组分析。
Bioinformatics. 2019 Nov 1;35(22):4757-4759. doi: 10.1093/bioinformatics/btz436.
3
pyBedGraph: a python package for fast operations on 1D genomic signal tracks.pyBedGraph:一个用于快速操作一维基因组信号轨迹的 Python 包。
Bioinformatics. 2020 May 1;36(10):3234-3235. doi: 10.1093/bioinformatics/btaa061.
4
flexiMAP: a regression-based method for discovering differential alternative polyadenylation events in standard RNA-seq data.flexiMAP:一种基于回归的方法,用于在标准 RNA-seq 数据中发现差异的可变多聚腺苷酸化事件。
Bioinformatics. 2021 Jun 16;37(10):1461-1464. doi: 10.1093/bioinformatics/btaa854.
5
ScanNeo: identifying indel-derived neoantigens using RNA-Seq data.ScanNeo:利用 RNA-Seq 数据鉴定移码突变衍生的新抗原。
Bioinformatics. 2019 Oct 15;35(20):4159-4161. doi: 10.1093/bioinformatics/btz193.
6
scGate: marker-based purification of cell types from heterogeneous single-cell RNA-seq datasets.scGate:基于标记的异质单细胞 RNA-seq 数据集细胞类型的纯化。
Bioinformatics. 2022 Apr 28;38(9):2642-2644. doi: 10.1093/bioinformatics/btac141.
7
ArtiFuse-computational validation of fusion gene detection tools without relying on simulated reads.无需依赖模拟读取的融合基因检测工具的计算验证。
Bioinformatics. 2020 Jan 15;36(2):373-379. doi: 10.1093/bioinformatics/btz613.
8
Data Sanitization to Reduce Private Information Leakage from Functional Genomics.数据清洗以减少功能基因组学中的私人信息泄露。
Cell. 2020 Nov 12;183(4):905-917.e16. doi: 10.1016/j.cell.2020.09.036.
9
Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression.基于层次分析的 RNA-seq 测序reads 提高了等位基因特异性表达的准确性。
Bioinformatics. 2018 Jul 1;34(13):2177-2184. doi: 10.1093/bioinformatics/bty078.
10
2DImpute: imputation in single-cell RNA-seq data from correlations in two dimensions.2DImpute:基于二维相关性的单细胞 RNA-seq 数据插补。
Bioinformatics. 2020 Jun 1;36(11):3588-3589. doi: 10.1093/bioinformatics/btaa148.

引用本文的文献

1
PPML-Omics: A privacy-preserving federated machine learning method protects patients' privacy in omic data.PPML-Omics:一种保护隐私的联邦机器学习方法,保护了组学数据中患者的隐私。
Sci Adv. 2024 Feb 2;10(5):eadh8601. doi: 10.1126/sciadv.adh8601. Epub 2024 Jan 31.
2
Responsible, practical genomic data sharing that accelerates research.负责任、实用的基因组数据共享,加速研究。
Nat Rev Genet. 2020 Oct;21(10):615-629. doi: 10.1038/s41576-020-0257-5. Epub 2020 Jul 21.