Chen Lexin, Smith Micah, Roe Daniel R, Miranda-Quintana Ramón Alain
Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA.
Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA.
bioRxiv. 2024 Dec 5:2024.12.05.627001. doi: 10.1101/2024.12.05.627001.
We are transforming Radial Threshold Clustering (RTC), an algorithm, into Extended Quality Clustering, an algorithm with several novel features. Daura et al's RTC algorithm is a partitioning clustering algorithm that groups similar frames together based on their similarity to the seed configuration. Two current issues with RTC is that it scales as making it inefficient at high frame counts, and the clustering results are dependent on the order of the input frames. To address the first issue, we have increased the speed of the seed selection by using -means++ to select the seeds of the available frames. To address the second issue and make the results invariant with respect to frame ordering, whenever there is a tie in the most populated cluster, the densest and most compact cluster is chosen using the extended similarity indices. The new algorithm is able to cluster in linear time and produce more compact and separate clusters.
我们正在将径向阈值聚类(RTC)算法转变为扩展质量聚类算法,后者具有若干新颖特性。多拉等人的RTC算法是一种划分聚类算法,它基于帧与种子配置的相似性将相似帧聚集在一起。RTC目前存在两个问题,一是其扩展性为 ,这使得在高帧数时效率低下,二是聚类结果依赖于输入帧的顺序。为解决第一个问题,我们通过使用K均值++来选择可用帧的种子,提高了种子选择的速度。为解决第二个问题并使结果与帧顺序无关,每当在人口最多的聚类中出现平局时,使用扩展相似性指标选择密度最大且最紧凑的聚类。新算法能够在线性时间内进行聚类,并产生更紧凑且分离的聚类。