• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

索马利尔:利用高效的基因组草图进行癌症和种系研究的快速相关性估计。

Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches.

机构信息

Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112, USA.

Base2 Genomics, LLC, Salt Lake City, UT, 84105, USA.

出版信息

Genome Med. 2020 Jul 14;12(1):62. doi: 10.1186/s13073-020-00761-2.

DOI:10.1186/s13073-020-00761-2
PMID:32664994
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7362544/
Abstract

BACKGROUND

When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics.

RESULTS

We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project.

CONCLUSIONS

Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier .

摘要

背景

在解释来自多个空间或纵向活检的测序数据时,检测样本混淆至关重要,但比检测种系变异更困难。在大多数肿瘤的基因组研究中,通过比较肿瘤和样本供体的匹配正常组织来检测遗传变异。在许多情况下,仅报告体细胞变异,这阻碍了使用仅基于遗传变异基因型检测样本交换的现有工具。为了解决这个问题,我们开发了 Somalier,这是一种直接在比对上运行的工具,不需要共同调用种系变体。相反,Somalier 从每个样本中提取一小部分有信息的遗传变异。然后可以在不到一秒的时间内比较数百个种系或体细胞样本的草图,这使得 Somalier 成为在大队列中测量相关性的有用工具。Somalier 生成文本输出和交互式可视化报告,这有助于使用多种相关性指标检测和纠正样本交换。

结果

我们介绍了该工具,并在一个包含五个胶质瘤样本的队列中展示了其实用性,每个样本都有一个正常、肿瘤和无细胞 DNA 样本。在 1000 基因组计划的高覆盖率序列数据上应用 Somalier 还可以识别出几个相关的样本。我们还证明它可以区分来自同一个体的全基因组和 RNA-seq 样本。

结论

Somalier 是一种可以快速评估测序数据相关性的工具。它可以应用于不同的测序数据类型和基因组构建,并且可以在 MIT 许可证下在 github.com/brentp/somalier 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/b993fff9a005/13073_2020_761_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/71b0505e3307/13073_2020_761_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/d0e0c505c2fb/13073_2020_761_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/03f9366b80fe/13073_2020_761_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/b993fff9a005/13073_2020_761_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/71b0505e3307/13073_2020_761_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/d0e0c505c2fb/13073_2020_761_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/03f9366b80fe/13073_2020_761_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dcb/7362544/b993fff9a005/13073_2020_761_Fig4_HTML.jpg

相似文献

1
Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches.索马利尔:利用高效的基因组草图进行癌症和种系研究的快速相关性估计。
Genome Med. 2020 Jul 14;12(1):62. doi: 10.1186/s13073-020-00761-2.
2
A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal.一种计算方法,用于从无匹配正常样本的癌症标本深度测序中区分基因组改变的体细胞起源与种系起源。
PLoS Comput Biol. 2018 Feb 7;14(2):e1005965. doi: 10.1371/journal.pcbi.1005965. eCollection 2018 Feb.
3
GASOLINE: detecting germline and somatic structural variants from long-reads data.GASOLINE:从长读数据中检测种系和体细胞结构变体。
Sci Rep. 2023 Nov 27;13(1):20817. doi: 10.1038/s41598-023-48285-0.
4
Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma.短读长读基因组测序方法用于体细胞变异检测;弥漫性大 B 细胞淋巴瘤患者的基因组分析。
Sci Rep. 2021 Mar 19;11(1):6408. doi: 10.1038/s41598-021-85354-8.
5
Mosdepth: quick coverage calculation for genomes and exomes.Mosdepth:基因组和外显子组的快速覆盖度计算。
Bioinformatics. 2018 Mar 1;34(5):867-868. doi: 10.1093/bioinformatics/btx699.
6
The Clinical Genome and Ancestry Report: An interactive web application for prioritizing clinically implicated variants from genome sequencing data with ancestry composition.临床基因组和祖源报告:一个用于从基因组测序数据中根据祖源成分优先考虑具有临床意义的变异的交互式网络应用程序。
Hum Mutat. 2020 Feb;41(2):387-396. doi: 10.1002/humu.23942. Epub 2019 Nov 15.
7
Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.Jitterbug:单核苷酸分辨率下的体细胞和种系转座子插入检测
BMC Genomics. 2015 Oct 12;16:768. doi: 10.1186/s12864-015-1975-5.
8
A benchmark and an algorithm for detecting germline transposon insertions and measuring de novo transposon insertion frequencies.一种检测种系转座子插入和测量从头(transposon)插入频率的基准和算法。
Nucleic Acids Res. 2021 May 7;49(8):e44. doi: 10.1093/nar/gkab010.
9
Evaluating somatic tumor mutation detection without matched normal samples.评估无配对正常样本的体细胞肿瘤突变检测。
Hum Genomics. 2017 Sep 4;11(1):22. doi: 10.1186/s40246-017-0118-2.
10
MoMI-G: modular multi-scale integrated genome graph browser.MoMI-G:模块化多尺度综合基因组图谱浏览器。
BMC Bioinformatics. 2019 Nov 5;20(1):548. doi: 10.1186/s12859-019-3145-2.

引用本文的文献

1
The Open Pediatric Cancer Project.开放儿科癌症项目
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf093.
2
Leveraging functional annotations to map rare variants associated with Alzheimer disease with gruyere.利用功能注释通过格鲁耶尔奶酪法来映射与阿尔茨海默病相关的罕见变异。 (注:“gruyere”可能是特定方法名称,直接音译为“格鲁耶尔”,具体含义可能需结合专业背景理解)
Am J Hum Genet. 2025 Aug 13. doi: 10.1016/j.ajhg.2025.07.016.
3
Pangenome discovery of missing autism variants.自闭症缺失变异体的泛基因组发现。

本文引用的文献

1
hts-nim: scripting high-performance genomic analyses.hts-nim:高性能基因组分析脚本编写。
Bioinformatics. 2018 Oct 1;34(19):3387-3389. doi: 10.1093/bioinformatics/bty358.
2
Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy.谁是谁?使用Peddy在人类DNA测序研究中检测和解决样本异常。
Am J Hum Genet. 2017 Mar 2;100(3):406-413. doi: 10.1016/j.ajhg.2017.01.017. Epub 2017 Feb 9.
3
HYSYS: have you swapped your samples?HYSYS:你换过样本了吗?
medRxiv. 2025 Jul 22:2025.07.21.25331932. doi: 10.1101/2025.07.21.25331932.
4
Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations.联合多层面基因组分析能够诊断多种极其罕见的单基因疾病表现。
Nat Commun. 2025 Aug 7;16(1):7267. doi: 10.1038/s41467-025-61712-2.
5
Integrated genomic analysis of NF1-associated peripheral nerve sheath tumors: an updated biorepository dataset.神经纤维瘤病1型相关周围神经鞘瘤的综合基因组分析:更新的生物样本库数据集
Sci Data. 2025 Jul 15;12(1):1229. doi: 10.1038/s41597-025-05433-7.
6
The mutagenic forces shaping the genomes of lung cancer in never smokers.塑造非吸烟者肺癌基因组的诱变力量。
Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09219-0.
7
Attention-based deep learning for analysis of pathology images and gene expression data in lung squamous premalignant lesions.基于注意力的深度学习用于肺鳞状上皮癌前病变的病理图像和基因表达数据分析
medRxiv. 2025 Jun 12:2025.06.06.25328492. doi: 10.1101/2025.06.06.25328492.
8
Protective Effects of Genetic Proxies of Cognitive Reserve in Parkinson's Disease: A Longitudinal Multi-Cohort Study.认知储备的基因替代指标在帕金森病中的保护作用:一项纵向多队列研究
Mov Disord. 2025 Jun 25. doi: 10.1002/mds.30276.
9
PISAD: reference-free intraspecies sample anomalies detection tool based on k-mer counting.PISAD:基于k-mer计数的无参考种内样本异常检测工具。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf061.
10
Clinical impact of pharmacogenomics in pediatric care: insights extracted from clinical exome sequencing.药物基因组学在儿科护理中的临床影响:从临床外显子组测序中提取的见解
Front Genet. 2025 May 29;16:1574325. doi: 10.3389/fgene.2025.1574325. eCollection 2025.
Bioinformatics. 2017 Feb 15;33(4):596-598. doi: 10.1093/bioinformatics/btw685.
4
The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.西蒙斯基因组多样性项目:来自142个不同群体的300个基因组。
Nature. 2016 Oct 13;538(7624):201-206. doi: 10.1038/nature18964. Epub 2016 Sep 21.
5
Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。
Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.
6
Conpair: concordance and contamination estimator for matched tumor-normal pairs.Conpair:匹配的肿瘤-正常样本对的一致性和污染估计器。
Bioinformatics. 2016 Oct 15;32(20):3196-3198. doi: 10.1093/bioinformatics/btw389. Epub 2016 Jun 26.
7
Second-generation PLINK: rising to the challenge of larger and richer datasets.第二代PLINK:应对更大、更丰富数据集的挑战
Gigascience. 2015 Feb 25;4:7. doi: 10.1186/s13742-015-0047-8. eCollection 2015.
8
Toward better understanding of artifacts in variant calling from high-coverage samples.为了更好地理解高覆盖样本中变体调用中的伪影。
Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27.
9
The Genotype-Tissue Expression (GTEx) project.基因型-组织表达 (GTEx) 项目。
Nat Genet. 2013 Jun;45(6):580-5. doi: 10.1038/ng.2653.
10
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.检测不纯和异质癌症样本中的体细胞点突变。
Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10.