Li Zhongyu, Metaxas Dimitris N, Lu Aidong, Zhang Shaoting
Department of Computer Science, University of North Carolina at Charlotte, USA.
Department of Computer Science, Rutgers University, USA.
Methods. 2017 Feb 15;115:100-109. doi: 10.1016/j.ymeth.2017.02.005. Epub 2017 Feb 17.
This paper proposes a novel framework to help biologists explore and analyze neurons based on retrieval of data from neuron morphological databases. In recent years, the continuously expanding neuron databases provide a rich source of information to associate neuronal morphologies with their functional properties. We design a coarse-to-fine framework for efficient and effective data retrieval from large-scale neuron databases. In the coarse-level, for efficiency in large-scale, we employ a binary coding method to compress morphological features into binary codes of tens of bits. Short binary codes allow for real-time similarity searching in Hamming space. Because the neuron databases are continuously expanding, it is inefficient to re-train the binary coding model from scratch when adding new neurons. To solve this problem, we extend binary coding with online updating schemes, which only considers the newly added neurons and update the model on-the-fly, without accessing the whole neuron databases. In the fine-grained level, we introduce domain experts/users in the framework, which can give relevance feedback for the binary coding based retrieval results. This interactive strategy can improve the retrieval performance through re-ranking the above coarse results, where we design a new similarity measure and take the feedback into account. Our framework is validated on more than 17,000 neuron cells, showing promising retrieval accuracy and efficiency. Moreover, we demonstrate its use case in assisting biologists to identify and explore unknown neurons.
本文提出了一种新颖的框架,以帮助生物学家基于从神经元形态数据库中检索数据来探索和分析神经元。近年来,不断扩展的神经元数据库提供了丰富的信息源,可将神经元形态与其功能特性相关联。我们设计了一个从粗到细的框架,用于从大规模神经元数据库中高效且有效地检索数据。在粗粒度级别,为了提高大规模处理的效率,我们采用二进制编码方法将形态特征压缩成几十位的二进制代码。短二进制代码允许在汉明空间中进行实时相似性搜索。由于神经元数据库在不断扩展,当添加新神经元时,从头重新训练二进制编码模型效率低下。为了解决这个问题,我们通过在线更新方案扩展二进制编码,该方案仅考虑新添加的神经元并实时更新模型,而无需访问整个神经元数据库。在细粒度级别,我们在框架中引入领域专家/用户,他们可以对基于二进制编码的检索结果给出相关性反馈。这种交互式策略可以通过对上述粗粒度结果进行重新排序来提高检索性能,在此过程中我们设计了一种新的相似性度量并考虑反馈。我们的框架在超过17000个神经元细胞上得到了验证,显示出有前景的检索准确性和效率。此外,我们展示了其在协助生物学家识别和探索未知神经元方面的用例。