• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一项旨在识别和纠正蛋白质基因组学研究中错误标记样本的社区行动。

A community effort to identify and correct mislabeled samples in proteogenomic studies.

作者信息

Yoo Seungyeul, Shi Zhiao, Wen Bo, Kho SoonJye, Pan Renke, Feng Hanying, Chen Hong, Carlsson Anders, Edén Patrik, Ma Weiping, Raymer Michael, Maier Ezekiel J, Tezak Zivana, Johanson Elaine, Hinton Denise, Rodriguez Henry, Zhu Jun, Boja Emily, Wang Pei, Zhang Bing

机构信息

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

出版信息

Patterns (N Y). 2021 May 7;2(5):100245. doi: 10.1016/j.patter.2021.100245. eCollection 2021 May 14.

DOI:10.1016/j.patter.2021.100245
PMID:34036290
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8134945/
Abstract

Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.

摘要

样本标记错误或注释错误一直是科学研究中的一个长期问题,由于多组学工作流程的复杂性,在大规模多组学研究中尤为普遍。迫切需要实施质量控制,以自动筛选和纠正多组学研究中的样本标记错误或注释错误。在此,我们描述了一个众包的precisionFDA NCI-CPTAC多组学样本标记错误校正挑战,该挑战为综合蛋白质基因组学研究的错误标记识别和校正方法提供了系统的基准测试和评估框架。该挑战收到了来自国内外数据科学家的大量提交内容,提交的方法表现出很大的差异。顶级团队与挑战组织者在挑战后的合作开发了一个开源软件COSMO,该软件在模拟和真实多组学数据集中的错误标记识别和校正方面表现出了很高的准确性和稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/60cb64270ef4/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/dfa114a9d4f4/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/441b9ccb9d12/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/e0eb05ec6bec/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/2ff34b876886/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/73ece628f9b2/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/60cb64270ef4/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/dfa114a9d4f4/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/441b9ccb9d12/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/e0eb05ec6bec/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/2ff34b876886/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/73ece628f9b2/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/8134945/60cb64270ef4/gr6.jpg

相似文献

1
A community effort to identify and correct mislabeled samples in proteogenomic studies.一项旨在识别和纠正蛋白质基因组学研究中错误标记样本的社区行动。
Patterns (N Y). 2021 May 7;2(5):100245. doi: 10.1016/j.patter.2021.100245. eCollection 2021 May 14.
2
Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance.弱监督深度学习中的不准确标签:自动识别和纠正及其对分类性能的影响。
IEEE J Biomed Health Inform. 2020 Sep;24(9):2701-2710. doi: 10.1109/JBHI.2020.2974425. Epub 2020 Feb 17.
3
Right data for right patient-a precisionFDA NCI-CPTAC Multi-omics Mislabeling Challenge.为合适的患者提供合适的数据——精准 FDA-NCI-CPTAC 多组学误标挑战赛。
Nat Med. 2018 Sep;24(9):1301-1302. doi: 10.1038/s41591-018-0180-x.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies.这到底是谁的样本?转录组学研究中样本的广泛错误注释。
F1000Res. 2016 Aug 30;5:2103. doi: 10.12688/f1000research.9471.2. eCollection 2016.
6
DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification.DECONbench:一个专门用于肿瘤异质性定量反卷积方法的基准测试平台。
BMC Bioinformatics. 2021 Oct 2;22(1):473. doi: 10.1186/s12859-021-04381-4.
7
A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis.一种概率多组学数据匹配方法,用于检测综合分析中的样本错误。
Gigascience. 2019 Jul 1;8(7). doi: 10.1093/gigascience/giz080.
8
An overview of technologies for MS-based proteomics-centric multi-omics.基于 MS 的蛋白质组学中心型多组学技术概述。
Expert Rev Proteomics. 2022 Mar;19(3):165-181. doi: 10.1080/14789450.2022.2070476. Epub 2022 May 2.
9
Multi-omic integration of microbiome data for identifying disease-associated modules.用于识别疾病相关模块的微生物组数据多组学整合
bioRxiv. 2024 Jan 23:2023.07.03.547607. doi: 10.1101/2023.07.03.547607.
10
A scoping review and proposed workflow for multi-omic rare disease research.多组学罕见病研究的范围综述和提出的工作流程。
Orphanet J Rare Dis. 2020 Apr 28;15(1):107. doi: 10.1186/s13023-020-01376-x.

引用本文的文献

1
Genotyping from targeted NGS data based on a small set of SNPs correctly matches patient samples.基于一小部分单核苷酸多态性(SNP)的靶向二代测序(NGS)数据进行基因分型能够正确匹配患者样本。
BMC Res Notes. 2025 Jul 2;18(1):270. doi: 10.1186/s13104-025-07348-3.
2
The Quartet Data Portal: integration of community-wide resources for multiomics quality control. Quartet 数据门户:整合社区范围内的资源,进行多组学质量控制。
Genome Biol. 2023 Oct 26;24(1):245. doi: 10.1186/s13059-023-03091-9.
3
SMAP is a pipeline for sample matching in proteogenomics.SMAP 是一个用于蛋白质基因组学样本匹配的流水线。

本文引用的文献

1
Detecting sample swaps in diverse NGS data types using linkage disequilibrium.利用连锁不平衡检测不同 NGS 数据类型中的样本交换。
Nat Commun. 2020 Jul 29;11(1):3697. doi: 10.1038/s41467-020-17453-5.
2
Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma.基于基因组与蛋白质组联合分析的肺腺癌治疗靶点研究
Cell. 2020 Jul 9;182(1):200-225.e35. doi: 10.1016/j.cell.2020.06.013.
3
Proteogenomic Characterization of Endometrial Carcinoma.子宫内膜癌的蛋白质基因组学特征分析。
Nat Commun. 2022 Feb 8;13(1):744. doi: 10.1038/s41467-022-28411-8.
4
Tissue heterogeneity is prevalent in gene expression studies.组织异质性在基因表达研究中普遍存在。
NAR Genom Bioinform. 2021 Sep 3;3(3):lqab077. doi: 10.1093/nargab/lqab077. eCollection 2021 Sep.
Cell. 2020 Feb 20;180(4):729-748.e26. doi: 10.1016/j.cell.2020.01.026. Epub 2020 Feb 13.
4
Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma.透明细胞肾细胞癌的综合蛋白质基因组特征分析
Cell. 2020 Jan 9;180(1):207. doi: 10.1016/j.cell.2019.12.026.
5
RNA sequencing: the teenage years.RNA 测序:青少年时期。
Nat Rev Genet. 2019 Nov;20(11):631-656. doi: 10.1038/s41576-019-0150-2. Epub 2019 Jul 24.
6
A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis.一种概率多组学数据匹配方法,用于检测综合分析中的样本错误。
Gigascience. 2019 Jul 1;8(7). doi: 10.1093/gigascience/giz080.
7
Next-generation characterization of the Cancer Cell Line Encyclopedia.下一代癌症细胞系百科全书的特征描述。
Nature. 2019 May;569(7757):503-508. doi: 10.1038/s41586-019-1186-3. Epub 2019 May 8.
8
Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities.人类结肠癌的蛋白质基因组分析揭示了新的治疗机会。
Cell. 2019 May 2;177(4):1035-1049.e19. doi: 10.1016/j.cell.2019.03.030. Epub 2019 Apr 25.
9
Right data for right patient-a precisionFDA NCI-CPTAC Multi-omics Mislabeling Challenge.为合适的患者提供合适的数据——精准 FDA-NCI-CPTAC 多组学误标挑战赛。
Nat Med. 2018 Sep;24(9):1301-1302. doi: 10.1038/s41591-018-0180-x.
10
Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics.癌症基因组学开端之际致癌过程的透视
Cell. 2018 Apr 5;173(2):305-320.e10. doi: 10.1016/j.cell.2018.03.033.