Suppr超能文献

FANCY:功能基因组学数据中隐私风险的快速评估。

FANCY: fast estimation of privacy risk in functional genomics data.

机构信息

Computational Biology and Bioinformatics, New Haven, CT 06520, USA.

Molecular Biophysics and Biochemistry, New Haven, CT 06520, USA.

出版信息

Bioinformatics. 2021 Jan 29;36(21):5145-5150. doi: 10.1093/bioinformatics/btaa661.

Abstract

MOTIVATION

Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release.

RESULTS

FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R2 for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low.

AVAILABILITY AND IMPLEMENTATION

A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

功能基因组学数据正变得具有临床可操作性,由此引发了隐私方面的担忧。然而,由于测序技术的异质性,通过基因分型来量化隐私泄露是很困难的。因此,我们提出了 FANCY,这是一种无需明确基因分型即可从原始 RNA-Seq、ATAC-Seq 和 ChIP-Seq 读取中快速估计泄露变体数量的工具。FANCY 使用整体测序统计数据作为特征的监督回归,在数据发布之前提供总体隐私风险的估计。

结果

FANCY 可以预测累积的泄露 SNV 数量,所有独立测试集的平均 0.95R2。我们意识到在泄露变体数量较少时准确预测的重要性。因此,我们开发了模型的特殊版本,当泄露变体数量较少时,可以进行更准确的预测。

可用性和实现

FANCY 的 Python 和 MATLAB 实现,以及生成特征的自定义脚本可在 https://github.com/gersteinlab/FANCY 上找到。我们还提供了 jupyter 笔记本,以便用户可以根据自己的数据优化回归模型中的参数。可以在 fancy.gersteinlab.org 上找到一个易于使用的输入和显示结果的网络服务器。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a58d/7850135/68629d859f27/btaa661f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验