• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高分级偏差:在评估群体分配中选定基因座子集的能力时存在微妙的问题。

High-grading bias: subtle problems with assessing power of selected subsets of loci for population assignment.

机构信息

NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd. East, Seattle, WA 98112, USA.

出版信息

Mol Ecol. 2010 Jul;19(13):2599-601. doi: 10.1111/j.1365-294X.2010.04675.x.

DOI:10.1111/j.1365-294X.2010.04675.x
PMID:20636893
Abstract

Recognition of the importance of cross-validation ('any technique or instance of assessing how the results of a statistical analysis will generalize to an independent dataset'; Wiktionary, en.wiktionary.org) is one reason that the U.S. Securities and Exchange Commission requires all investment products to carry some variation of the disclaimer, 'Past performance is no guarantee of future results.' Even a cursory examination of financial behaviour, however, demonstrates that this warning is regularly ignored, even by those who understand what an independent dataset is. In the natural sciences, an analogue to predicting future returns for an investment strategy is predicting power of a particular algorithm to perform with new data. Once again, the key to developing an unbiased assessment of future performance is through testing with independent data--that is, data that were in no way involved in developing the method in the first place. A 'gold-standard' approach to cross-validation is to divide the data into two parts, one used to develop the algorithm, the other used to test its performance. Because this approach substantially reduces the sample size that can be used in constructing the algorithm, researchers often try other variations of cross-validation to accomplish the same ends. As illustrated by Anderson in this issue of Molecular Ecology Resources, however, not all attempts at cross-validation produce the desired result. Anderson used simulated data to evaluate performance of several software programs designed to identify subsets of loci that can be effective for assigning individuals to population of origin based on multilocus genetic data. Such programs are likely to become increasingly popular as researchers seek ways to streamline routine analyses by focusing on small sets of loci that contain most of the desired signal. Anderson found that although some of the programs made an attempt at cross-validation, all failed to meet the 'gold standard' of using truly independent data and therefore produced overly optimistic assessments of power of the selected set of loci--a phenomenon known as 'high grading bias.'

摘要

认识到交叉验证的重要性(“任何一种评估统计分析结果在独立数据集上推广能力的技术或实例”;Wiktionary,en.wiktionary.org)是美国证券交易委员会要求所有投资产品都带有某种免责声明的原因之一,即“过去的表现不能保证未来的结果”。然而,只要对金融行为稍加研究,就会发现即使是那些知道什么是独立数据集的人,也经常忽略这一警告。在自然科学中,预测投资策略未来回报的类似方法是预测特定算法在新数据上的性能。同样,开发对未来绩效无偏差评估的关键是通过使用独立数据进行测试——也就是说,这些数据在最初开发方法时根本没有参与。交叉验证的“黄金标准”方法是将数据分为两部分,一部分用于开发算法,另一部分用于测试其性能。由于这种方法大大减少了可以用于构建算法的样本量,因此研究人员经常尝试其他交叉验证变体来达到相同的目的。然而,正如安德森在本期《分子生态学资源》中所说明的那样,并非所有的交叉验证尝试都能产生预期的结果。安德森使用模拟数据评估了几种软件程序的性能,这些程序旨在识别能够根据多基因数据将个体分配到起源种群的有效基因座子集。随着研究人员寻求通过专注于包含大部分所需信号的小基因座集来简化常规分析的方法,这类程序可能会越来越受欢迎。安德森发现,尽管有些程序尝试了交叉验证,但都没有达到使用真正独立数据的“黄金标准”,因此对所选基因座集的能力产生了过于乐观的评估——这种现象被称为“高分偏差”。

相似文献

1
High-grading bias: subtle problems with assessing power of selected subsets of loci for population assignment.高分级偏差:在评估群体分配中选定基因座子集的能力时存在微妙的问题。
Mol Ecol. 2010 Jul;19(13):2599-601. doi: 10.1111/j.1365-294X.2010.04675.x.
2
Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
3
[Future prospects of molecular epidemiology in tuberculosis].[结核病分子流行病学的未来前景]
Kekkaku. 2009 Dec;84(12):783-4.
4
The NCI All Ireland Cancer Conference.美国国家癌症研究所全爱尔兰癌症会议。
Oncologist. 1999;4(4):275-277.
5
"Just Another Statistic".“只是又一个统计数字”
Oncologist. 1998;3(3):III-IV.
6
Prophylactic Oophorectomy: Reducing the U.S. Death Rate from Epithelial Ovarian Cancer. A Continuing Debate.预防性卵巢切除术:降低美国上皮性卵巢癌死亡率。一场持续的争论。
Oncologist. 1996;1(5):326-330.
7
[The analysis of physicians' work: announcing the end of attempts at in vitro fertilization].[医生工作分析:宣告体外受精尝试的终结]
Encephale. 2003 Jul-Aug;29(4 Pt 1):293-305.
8
Lifetime earnings patterns, the distribution of future Social Security benefits, and the impact of pension reform.终身收入模式、未来社会保障福利的分配以及养老金改革的影响。
Soc Secur Bull. 2000;63(4):74-98.
9
Validating module network learning algorithms using simulated data.使用模拟数据验证模块网络学习算法。
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-8-S2-S5.
10
What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity.什么是种群?对一些用于确定基因库数量及其连通程度的遗传方法的实证评估。
Mol Ecol. 2006 May;15(6):1419-39. doi: 10.1111/j.1365-294X.2006.02890.x.

引用本文的文献

1
'Highly-Informative' Genetic Markers Can Bias Conclusions: Examples and General Solutions.“高信息量”基因标记可能会使结论产生偏差:示例与通用解决方案
Mol Ecol Resour. 2025 Oct;25(7):e70011. doi: 10.1111/1755-0998.70011. Epub 2025 Jul 11.
2
A Seascape Genomics Perspective on Restrictive Genetic Connectivity Overcoming Signals of Local Adaptations in the Green Abalone () of the California Current System.从海景基因组学角度看加利福尼亚洋流系统绿鲍()中限制基因连通性克服局部适应性信号的情况。
Ecol Evol. 2025 Feb 4;15(2):e70913. doi: 10.1002/ece3.70913. eCollection 2025 Feb.
3
A SNP assay for assessing diversity in immune genes in the honey bee (Apis mellifera L.).
用于评估蜜蜂(Apis mellifera L.)免疫基因多样性的 SNP 分析。
Sci Rep. 2021 Jul 28;11(1):15317. doi: 10.1038/s41598-021-94833-x.
4
Leveraging genomics to understand threats to migratory birds.利用基因组学来了解候鸟面临的威胁。
Evol Appl. 2021 Apr 10;14(6):1646-1658. doi: 10.1111/eva.13231. eCollection 2021 Jun.
5
Mixed-stock analysis using Rapture genotyping to evaluate stock-specific exploitation of a walleye population despite weak genetic structure.利用Rapture基因分型进行混合群体分析,以评估尽管遗传结构较弱但大眼狮鲈群体的特定群体捕捞情况。
Evol Appl. 2021 Mar 30;14(5):1403-1420. doi: 10.1111/eva.13209. eCollection 2021 May.
6
The ghosts of propagation past: haplotype information clarifies the relative influence of stocking history and phylogeographic processes on contemporary population structure of walleye ().过去传播的幽灵:单倍型信息阐明了放流历史和系统地理学过程对大眼狮鲈当代种群结构的相对影响。
Evol Appl. 2021 Jan 29;14(4):1124-1144. doi: 10.1111/eva.13186. eCollection 2021 Apr.
7
Range-wide population genomics of the Mexican fruit fly: Toward development of pathway analysis tools.墨西哥果蝇的全范围群体基因组学:迈向通路分析工具的开发
Evol Appl. 2019 Jun 13;12(8):1641-1660. doi: 10.1111/eva.12824. eCollection 2019 Sep.
8
Seascape genomics of eastern oyster () along the Atlantic coast of Canada.加拿大东海岸东部牡蛎()的海景基因组学。
Evol Appl. 2018 Dec 26;12(3):587-609. doi: 10.1111/eva.12741. eCollection 2019 Mar.
9
Is the Red Wolf a Listable Unit Under the US Endangered Species Act?红狼是否属于美国濒危物种法案中的可列名单元?
J Hered. 2018 Jun 27;109(5):585-597. doi: 10.1093/jhered/esy020.
10
One species or four? Yes!...and, no. Or, arbitrary assignment of lineages to species obscures the diversification processes of Neotropical fishes.一个物种还是四个?是!……而且,不是。或者,将谱系随意划分为物种掩盖了新热带鱼类的多样化过程。
PLoS One. 2017 Feb 24;12(2):e0172349. doi: 10.1371/journal.pone.0172349. eCollection 2017.