Department of Mathematics, University of California, Irvine, Irvine, CA, USA.
Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, Irvine, CA, USA.
Nat Methods. 2024 Sep;21(9):1597-1602. doi: 10.1038/s41592-024-02390-8. Epub 2024 Aug 22.
Over the last decade, biology has begun utilizing 'big data' approaches, resulting in large, comprehensive atlases in modalities ranging from transcriptomics to neural connectomics. However, these approaches must be complemented and integrated with 'small data' approaches to efficiently utilize data from individual labs. Integration of smaller datasets with major reference atlases is critical to provide context to individual experiments, and approaches toward integration of large and small data have been a major focus in many fields in recent years. Here we discuss progress in integration of small data with consortium-sized atlases across multiple modalities, and its potential applications. We then examine promising future directions for utilizing the power of small data to maximize the information garnered from small-scale experiments. We envision that, in the near future, international consortia comprising many laboratories will work together to collaboratively build reference atlases and foundation models using small data methods.
在过去的十年中,生物学开始利用“大数据”方法,从而在从转录组学到神经连接组学等多种模式中产生了大型综合图谱。然而,这些方法必须与“小数据”方法相结合并进行整合,以便有效地利用各个实验室的数据。将较小的数据集与主要参考图谱集成对于为单个实验提供上下文至关重要,并且近年来,将大数据和小数据集成的方法一直是许多领域的主要关注点。在这里,我们讨论了在多个模式下将小数据与联盟规模的图谱集成的进展及其潜在应用。然后,我们研究了利用小数据的强大功能来最大程度地从小规模实验中获取信息的有前途的未来方向。我们设想,在不久的将来,由许多实验室组成的国际联盟将共同努力,使用小数据方法来协作构建参考图谱和基础模型。