• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

持续的软件开发,而不是引用数量或期刊选择,是准确生物信息学软件的指标。

Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software.

机构信息

Department of Biochemistry, University of Otago, Dunedin, New Zealand.

Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand.

出版信息

Genome Biol. 2022 Feb 16;23(1):56. doi: 10.1186/s13059-022-02625-x.

DOI:10.1186/s13059-022-02625-x
PMID:35172880
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8851831/
Abstract

BACKGROUND

Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software.

RESULTS

We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs.

CONCLUSIONS

Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish-possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate.

摘要

背景

计算生物学为测试和推断生物数据提供了软件工具。面对越来越多的数据,可能会采用以软件速度换取准确性的启发式方法。我们使用大量独立软件基准测试的结果研究了这些权衡,并评估了外部因素(包括速度、作者声誉、期刊影响力、时效性和开发者努力)是否能准确反映软件的情况。

结果

我们发现软件速度、作者声誉、期刊影响力、引用次数和年龄都不能可靠地预测软件的准确性。这很不幸,因为这些因素经常被用来选择软件工具。然而,GitHub 衍生的统计数据和高版本号表明,准确的生物信息学软件工具通常是随着时间的推移不断改进的产物。我们还发现,缓慢而不准确的生物信息学软件工具过多,而且这种情况在许多子学科中都存在。在准确性和速度权衡方面,处于中等水平的工具很少。

结论

我们的研究结果表明,准确的生物信息学软件主要是长期致力于软件开发的产物。此外,我们假设生物信息学软件存在发表偏倚。在速度和准确性方面都处于中等水平的软件可能难以发表——这可能是由于作者、编辑和审稿人的实践造成的。这使得文献中留下了一个不幸的空白,因为理想的工具可能会落入这个空白。如果速度较慢,那么高精度工具并不总是有用的,而如果结果也不准确,那么高速度也没有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c53/8851831/6d68b50d72f4/13059_2022_2625_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c53/8851831/8af24ef10e16/13059_2022_2625_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c53/8851831/6d68b50d72f4/13059_2022_2625_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c53/8851831/8af24ef10e16/13059_2022_2625_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c53/8851831/6d68b50d72f4/13059_2022_2625_Fig2_HTML.jpg

相似文献

1
Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software.持续的软件开发,而不是引用数量或期刊选择,是准确生物信息学软件的指标。
Genome Biol. 2022 Feb 16;23(1):56. doi: 10.1186/s13059-022-02625-x.
2
SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.SimBA:一种用于评估RNA测序生物信息学流程性能的方法和工具。
BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.
3
Impact Factors and Prediction of Popular Topics in a Journal.期刊中热门话题的影响因素及预测
Ultraschall Med. 2016 Aug;37(4):343-5. doi: 10.1055/s-0042-111209. Epub 2016 Aug 4.
4
A large-scale analysis of bioinformatics code on GitHub.在 GitHub 上对生物信息学代码进行大规模分析。
PLoS One. 2018 Oct 31;13(10):e0205898. doi: 10.1371/journal.pone.0205898. eCollection 2018.
5
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
6
Aberration of the Citation.引用偏差
Account Res. 2016;23(4):230-44. doi: 10.1080/08989621.2015.1127763.
7
Predatory publishing or a lack of peer review transparency?-a contemporary analysis of indexed open and non-open access articles in paediatric urology.掠夺性出版还是缺乏同行评审透明度?-小儿泌尿外科索引开放和非开放获取文章的当代分析。
J Pediatr Urol. 2019 Apr;15(2):159.e1-159.e7. doi: 10.1016/j.jpurol.2018.08.019. Epub 2019 Feb 15.
8
Can tweets predict citations? Metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact.推文能否预测引用量?基于推特的社会影响力指标及其与传统科学影响力指标的相关性。
J Med Internet Res. 2011 Dec 19;13(4):e123. doi: 10.2196/jmir.2012.
9
Publication of reviews synthesizing child health evidence (PORSCHE): a survey of authors to identify factors associated with publication in Cochrane and non-Cochrane sources.综合儿童健康证据的综述发表情况(PORSCHE):一项针对作者的调查,以确定与在Cochrane及非Cochrane来源发表相关的因素。
Syst Rev. 2016 Jun 21;5(1):104. doi: 10.1186/s13643-016-0276-7.
10
Author and journal self-citation in Emergency Medicine original research articles.《急诊医学》原创研究文章中的作者自引和期刊自引
Am J Emerg Med. 2021 Dec;50:481-485. doi: 10.1016/j.ajem.2021.09.005. Epub 2021 Sep 6.

引用本文的文献

1
A workflow for statistical analysis and visualization of microbiome omics data using the R microeco package.一种使用R语言的microeco软件包对微生物组组学数据进行统计分析和可视化的工作流程。
Nat Protoc. 2025 Aug 6. doi: 10.1038/s41596-025-01239-4.
2
A bioinformatician, computer scientist, and geneticist lead bioinformatic tool development-which one is better?一位生物信息学家、计算机科学家和遗传学家主导生物信息工具开发——哪一位更胜一筹?
Bioinform Adv. 2025 Jan 29;5(1):vbaf011. doi: 10.1093/bioadv/vbaf011. eCollection 2025.
3
ProEnd: a comprehensive database for identifying HbYX motif-containing proteins across the tree of life.

本文引用的文献

1
On the optimistic performance evaluation of newly introduced bioinformatic methods.新引入的生物信息学方法的乐观性能评估。
Genome Biol. 2021 May 11;22(1):152. doi: 10.1186/s13059-021-02365-4.
2
The Boeing 737 MAX: Lessons for Engineering Ethics.波音 737 MAX:工程伦理的教训。
Sci Eng Ethics. 2020 Dec;26(6):2957-2974. doi: 10.1007/s11948-020-00252-y. Epub 2020 Jul 10.
3
There's plenty of room at the Top: What will drive computer performance after Moore's law?有足够的空间在顶部:在摩尔定律之后,什么将推动计算机性能?
ProEnd:一个全面的数据库,用于鉴定整个生命之树中含有 HbYX 基序的蛋白质。
BMC Genomics. 2024 Oct 13;25(1):951. doi: 10.1186/s12864-024-10864-4.
4
SeqKit2: A Swiss army knife for sequence and alignment processing.SeqKit2:一款用于序列和比对处理的瑞士军刀式工具。
Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. eCollection 2024 Jun.
5
ProEnd: A Comprehensive Database for Identifying HbYX Motif-Containing Proteins Across the Tree of Life.ProEnd:一个用于识别生命之树中含HbYX基序蛋白质的综合数据库。
bioRxiv. 2024 Jun 9:2024.06.08.598080. doi: 10.1101/2024.06.08.598080.
6
Packaging and containerization of computational methods.计算方法的封装和容器化。
Nat Protoc. 2024 Sep;19(9):2529-2539. doi: 10.1038/s41596-024-00986-0. Epub 2024 Apr 2.
7
A choice, not an obligation : Releasing scientific software as open source should be the responsibility of the authors.一种选择,而非义务:将科学软件作为开源发布应该是作者的责任。
EMBO Rep. 2024 Feb;25(2):464-466. doi: 10.1038/s44319-023-00039-9. Epub 2024 Jan 2.
8
Cellsnake: a user-friendly tool for single-cell RNA sequencing analysis.Cellsnake:单细胞 RNA 测序分析的用户友好工具。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad091. Epub 2023 Oct 27.
9
WebQUAST: online evaluation of genome assemblies.WebQUAST:基因组组装的在线评估。
Nucleic Acids Res. 2023 Jul 5;51(W1):W601-W606. doi: 10.1093/nar/gkad406.
10
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks.网络动物园:用于推断和分析基因调控网络的多语言包。
Genome Biol. 2023 Mar 9;24(1):45. doi: 10.1186/s13059-023-02877-1.
Science. 2020 Jun 5;368(6495). doi: 10.1126/science.aam9744.
4
Challenges in funding and developing genomic software: roots and remedies.基因组软件研发与资金投入的挑战:根源与对策。
Genome Biol. 2019 Jul 29;20(1):147. doi: 10.1186/s13059-019-1763-7.
5
Essential guidelines for computational method benchmarking.计算方法基准测试的基本指南。
Genome Biol. 2019 Jun 20;20(1):125. doi: 10.1186/s13059-019-1738-8.
6
Challenges and recommendations to improve the installability and archival stability of omics computational tools.提高组学计算工具可安装性和档案稳定性的挑战和建议。
PLoS Biol. 2019 Jun 20;17(6):e3000333. doi: 10.1371/journal.pbio.3000333. eCollection 2019 Jun.
7
Reliable novelty: New should not trump true.可靠新颖性:新不应胜过真。
PLoS Biol. 2019 Feb 12;17(2):e3000117. doi: 10.1371/journal.pbio.3000117. eCollection 2019 Feb.
8
GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.作为衡量开源生物信息学软件影响力指标的GitHub统计数据
Front Bioeng Biotechnol. 2018 Dec 18;6:198. doi: 10.3389/fbioe.2018.00198. eCollection 2018.
9
CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.系统发育树的置信区间:一种使用自展法的方法。
Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.
10
Publication bias and the canonization of false facts.发表性偏倚与虚假事实的公认化
Elife. 2016 Dec 20;5:e21451. doi: 10.7554/eLife.21451.