Sariyanidi Evangelos, Zampella Casey J, Bartley Keith G, Herrington John D, Satterthwaite Theodore D, Schultz Robert T, Tunc Birkan
Center for Autism Research, Children's Hospital of Philadelphia.
University of Pennsylvania.
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2020 Jun;2020:9490-9499. doi: 10.1109/cvpr42600.2020.00951. Epub 2020 Aug 5.
Finding the largest subset of sequences (i.e., time series) that are correlated above a certain threshold, within large datasets, is of significant interest for computer vision and pattern recognition problems across domains, including behavior analysis, computational biology, neuroscience, and finance. Maximal clique algorithms can be used to solve this problem, but they are not scalable. We present an approximate, but highly efficient and scalable, method that represents the search space as a union of sets called ϵ-expanded clusters, one of which is theoretically guaranteed to contain the largest subset of synchronized sequences. The method finds synchronized sets by fitting a Euclidean ball on ϵ-expanded clusters, using Jung's theorem. We validate the method on data from the three distinct domains of facial behavior analysis, finance, and neuroscience, where we respectively discover the synchrony among pixels of face videos, stock market item prices, and dynamic brain connectivity data. Experiments show that our method produces results comparable to, but up to 300 times faster than, maximal clique algorithms, with speed gains increasing exponentially with the number of input sequences.
在大型数据集中找到相关性高于特定阈值的最大序列子集(即时间序列),对于包括行为分析、计算生物学、神经科学和金融等多个领域的计算机视觉和模式识别问题具有重要意义。最大团算法可用于解决此问题,但它们不可扩展。我们提出了一种近似但高效且可扩展的方法,该方法将搜索空间表示为称为ϵ扩展簇的集合的并集,理论上保证其中一个簇包含同步序列的最大子集。该方法使用容格定理,通过在ϵ扩展簇上拟合欧几里得球来找到同步集。我们在面部行为分析、金融和神经科学这三个不同领域的数据上验证了该方法,在这些领域中我们分别发现了面部视频像素、股票市场项目价格和动态脑连接数据之间的同步性。实验表明,我们的方法产生的结果与最大团算法相当,但速度快达300倍,且速度提升随着输入序列数量呈指数增长。