• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组研究中使用的极小P值的准确性如何:数值库的评估

How Accurate are the Extremely Small P-values Used in Genomic Research: An Evaluation of Numerical Libraries.

作者信息

Bangalore Sai Santosh, Wang Jelai, Allison David B

机构信息

The University of Alabama at Birmingham, Section on Statistical Genetics, Department of Biostatistics, RPHB 327, 1665 University Boulevard, Birmingham, AL-35294-0022, USA.

出版信息

Comput Stat Data Anal. 2009 May 15;53(7):2446-2452. doi: 10.1016/j.csda.2008.11.028.

DOI:10.1016/j.csda.2008.11.028
PMID:20161126
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2742983/
Abstract

In the fields of genomics and high dimensional biology (HDB), massive multiple testing prompts the use of extremely small significance levels. Because tail areas of statistical distributions are needed for hypothesis testing, the accuracy of these areas is important to confidently make scientific judgments. Previous work on accuracy was primarily focused on evaluating professionally written statistical software, like SAS, on the Statistical Reference Datasets (StRD) provided by National Institute of Standards and Technology (NIST) and on the accuracy of tail areas in statistical distributions. The goal of this paper is to provide guidance to investigators, who are developing their own custom scientific software built upon numerical libraries written by others. In specific, we evaluate the accuracy of small tail areas from cumulative distribution functions (CDF) of the Chi-square and t-distribution by comparing several open-source, free, or commercially licensed numerical libraries in Java, C, and R to widely accepted standards of comparison like ELV and DCDFLIB. In our evaluation, the C libraries and R functions are consistently accurate up to six significant digits. Amongst the evaluated Java libraries, Colt is most accurate. These languages and libraries are popular choices among programmers developing scientific software, so the results herein can be useful to programmers in choosing libraries for CDF accuracy.

摘要

在基因组学和高维生物学(HDB)领域,大规模多重检验促使人们使用极低的显著性水平。由于假设检验需要统计分布的尾部区域,这些区域的准确性对于可靠地做出科学判断至关重要。先前关于准确性的工作主要集中在评估专业编写的统计软件,如SAS,使用美国国家标准与技术研究院(NIST)提供的统计参考数据集(StRD),以及统计分布中尾部区域的准确性。本文的目的是为那些基于他人编写的数值库开发自己的定制科学软件的研究人员提供指导。具体而言,我们通过将Java、C和R中的几个开源、免费或商业许可的数值库与广泛接受的比较标准(如ELV和DCDFLIB)进行比较,来评估卡方分布和t分布的累积分布函数(CDF)中小尾部区域的准确性。在我们的评估中,C库和R函数在六位有效数字内始终保持准确。在所评估的Java库中,Colt最为准确。这些语言和库是开发科学软件的程序员的常用选择,因此本文的结果对于程序员选择具有CDF准确性的库可能会有所帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bb/2742983/85d1fbe15ae2/nihms82180f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bb/2742983/a7c37a99a2c1/nihms82180f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bb/2742983/85d1fbe15ae2/nihms82180f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bb/2742983/a7c37a99a2c1/nihms82180f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bb/2742983/85d1fbe15ae2/nihms82180f2.jpg

相似文献

1
How Accurate are the Extremely Small P-values Used in Genomic Research: An Evaluation of Numerical Libraries.基因组研究中使用的极小P值的准确性如何:数值库的评估
Comput Stat Data Anal. 2009 May 15;53(7):2446-2452. doi: 10.1016/j.csda.2008.11.028.
2
Open source libraries and frameworks for biological data visualisation: a guide for developers.用于生物数据可视化的开源库和框架:开发者指南。
Proteomics. 2015 Apr;15(8):1356-74. doi: 10.1002/pmic.201400377. Epub 2015 Feb 5.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Sharing Programming Resources Between Bio* Projects.生物相关项目间的编程资源共享。
Methods Mol Biol. 2019;1910:747-766. doi: 10.1007/978-1-4939-9074-0_25.
5
A Padawan Programmer's Guide to Developing Software Libraries.《学徒程序员开发软件库指南》
Cell Syst. 2017 Nov 22;5(5):431-437. doi: 10.1016/j.cels.2017.08.003. Epub 2017 Oct 4.
6
Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies.通过远程过程调用和原生调用栈策略在生物相关项目之间共享编程资源。
Methods Mol Biol. 2012;856:513-27. doi: 10.1007/978-1-61779-585-5_21.
7
SpinSPJ: a novel NMR scripting system to implement artificial intelligence and advanced applications.SpinSPJ:一种新颖的 NMR 脚本系统,用于实现人工智能和高级应用。
BMC Bioinformatics. 2021 Dec 7;22(1):581. doi: 10.1186/s12859-021-04492-y.
8
Rnalib: a Python library for custom transcriptomics analyses.Rnalib:一个用于定制转录组学分析的Python库。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae751.
9
IQM: an extensible and portable open source application for image and signal analysis in Java.IQM:一个用Java编写的、可扩展且便携的用于图像和信号分析的开源应用程序。
PLoS One. 2015 Jan 22;10(1):e0116329. doi: 10.1371/journal.pone.0116329. eCollection 2015.
10
NIST Mass Spectrometry Data Center standard reference libraries and software tools: Application to seized drug analysis.NIST 质谱数据中心标准参考库和软件工具:在缉获毒品分析中的应用。
J Forensic Sci. 2023 Sep;68(5):1484-1493. doi: 10.1111/1556-4029.15284. Epub 2023 May 18.

引用本文的文献

1
Misstatements, misperceptions, and mistakes in controlling for covariates in observational research.在观察性研究中对协变量进行控制时的错误陈述、误解和错误。
Elife. 2024 May 16;13:e82268. doi: 10.7554/eLife.82268.
2
Learning epistatic polygenic phenotypes with Boolean interactions.学习具有布尔交互作用的上位多基因表型。
PLoS One. 2024 Apr 16;19(4):e0298906. doi: 10.1371/journal.pone.0298906. eCollection 2024.
3
Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis.用交叉熵法准确高效地估计小 P 值:在基因组数据分析中的应用。
Bioinformatics. 2019 Jul 15;35(14):2441-2448. doi: 10.1093/bioinformatics/bty1005.
4
A Note on Comparing the Power of Test Statistics at Low Significance Levels.关于在低显著性水平下比较检验统计量功效的一则注释。
Am Stat. 2011 Jan 1;65(3). doi: 10.1198/tast.2011.10117.
5
An African ancestry-specific allele of CTLA4 confers protection against rheumatoid arthritis in African Americans.CTLA4基因的一个非洲裔特异性等位基因可保护非裔美国人免受类风湿性关节炎的侵害。
PLoS Genet. 2009 Mar;5(3):e1000424. doi: 10.1371/journal.pgen.1000424. Epub 2009 Mar 20.

本文引用的文献

1
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。
Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.
2
A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.FTO基因中的一种常见变异与体重指数相关,并易导致儿童期和成年期肥胖。
Science. 2007 May 11;316(5826):889-94. doi: 10.1126/science.1141634. Epub 2007 Apr 12.