Burns Randal, Roncal William Gray, Kleissas Dean, Lillaney Kunal, Manavalan Priya, Perlman Eric, Berger Daniel R, Bock Davi D, Chung Kwanghun, Grosenick Logan, Kasthuri Narayanan, Weiler Nicholas C, Deisseroth Karl, Kazhdan Michael, Lichtman Jeff, Reid R Clay, Smith Stephen J, Szalay Alexander S, Vogelstein Joshua T, Vogelstein R Jacob
Department of Computer Science and the Institute for Data Intensive Engineering and Science, Johns Hopkins University.
Johns Hopkins University Applied Physics Laboratory.
Sci Stat Database Manag. 2013. doi: 10.1145/2484838.2484870.
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
我们描述了一个可扩展的数据库集群,用于高通量脑成像数据的空间分析和注释,最初用于三维电子显微镜图像堆栈,但也适用于时间序列和多通道数据。该系统主要针对利用高性能计算集群上的计算机视觉算法并行执行来构建大脑神经连接图谱的工作负载进行设计。这些服务和开放科学数据集可在openconnecto.me上公开获取。系统设计很大程度上继承了NoSQL横向扩展和数据密集型计算架构。我们通过对空间索引进行分区将数据分发到集群节点。我们将I/O定向到不同的系统——读取到并行磁盘阵列,写入到固态存储——以避免I/O干扰并最大化吞吐量。所有编程接口都是RESTful Web服务,简单且无状态,提高了可扩展性和可用性。我们对生产系统进行了性能评估,突出了空间数据组织的有效性。