Analysis & Experimentation, Microsoft, One Microsoft way, Redmond, WA, 98052, USA.
Airbnb, 888 Brannan St, San Francisco, CA, 94103, USA.
Trials. 2020 Feb 7;21(1):150. doi: 10.1186/s13063-020-4084-y.
Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in technology, the marginal cost of such experiments is approaching zero and the value for data-driven decision-making is broadly recognized.
This is an overview of key scaling lessons learned in the technology field. They include (1) a focus on metrics, an overall evaluation criterion and thousands of metrics for insights and debugging, automatically computed for every experiment; (2) quick release cycles with automated ramp-up and shut-down that afford agile and safe experimentation, leading to consistent incremental progress over time; and (3) a culture of 'test everything' because most ideas fail and tiny changes sometimes show surprising outcomes worth millions of dollars annually. Technological advances, online interactions, and the availability of large-scale data allowed technology companies to take the science of RCTs and use them as online randomized controlled experiments at large scale with hundreds of such concurrent experiments running on any given day on a wide range of software products, be they web sites, mobile applications, or desktop applications. Rather than hindering innovation, these experiments enabled accelerated innovation with clear improvements to key metrics, including user experience and revenue. As healthcare increases interactions with patients utilizing these modern channels of web sites and digital health applications, many of the lessons apply. The most innovative technological field has recognized that systematic series of randomized trials with numerous failures of the most promising ideas leads to sustainable improvement.
While there are many differences between technology and medicine, it is worth considering whether and how similar designs can be applied via simple RCTs that focus on healthcare decision-making or service delivery. Changes - small and large - should undergo continuous and repeated evaluations in randomized trials and learning from their results will enable accelerated healthcare improvements.
许多科技公司,包括 Airbnb、Amazon、Booking.com、eBay、Facebook、Google、LinkedIn、Lyft、Microsoft、Netflix、Twitter、Uber 和 Yahoo!/Oath,都在大规模地进行在线随机对照实验,即在数百万用户中同时进行数百项对照实验,通常称为 A/B 测试。最初源于相同的统计根源,医学上的随机对照试验(RCT)现在因昂贵且困难而受到批评,而在科技领域,这种实验的边际成本接近零,数据驱动决策的价值得到广泛认可。
这是对科技领域关键扩展经验的概述。它们包括:(1)专注于指标,即总体评估标准和数千个用于洞察和调试的指标,这些指标可自动为每个实验计算;(2)快速发布周期,具有自动增加和关闭功能,可实现灵活和安全的实验,从而随着时间的推移实现持续的增量进展;(3)“测试一切”的文化,因为大多数想法都会失败,而微小的变化有时会带来每年价值数百万美元的惊人结果。技术进步、在线互动和大规模数据的可用性使科技公司能够采用 RCT 科学,并将其用作大规模的在线随机对照实验,每天有数百项此类并发实验在各种软件产品上运行,无论是网站、移动应用程序还是桌面应用程序。这些实验并没有阻碍创新,反而通过对关键指标(包括用户体验和收入)的明显改进,加速了创新。随着医疗保健利用网站和数字健康应用等现代渠道增加与患者的互动,许多经验教训都适用。最具创新性的科技领域已经认识到,系统的随机试验系列和最有前途的想法的大量失败导致了可持续的改进。
虽然科技和医学之间存在许多差异,但值得考虑是否以及如何通过简单的 RCT 应用类似的设计,这些 RCT 专注于医疗保健决策或服务提供。无论是大是小的变化,都应该在随机试验中进行持续和重复的评估,从结果中学习将使医疗保健加速改进。