Marigold: Efficient k-means Clustering in High Dimensions
Kasper Overgaard Mortensen, Fatemeh Zardbani, Mohammad Ahsanul Haque, Steinn Ymir Agustsson, Davide Mottin, Philip Hofmann, Panagiotis Karras
VLDB 2023

On high-dimensional data, such as physics experiments, MARIGOLD is the scalable answer to compute k-means.
TL;DR
Marigold delivers the exact result of Lloyd’s algorithm for k-means in at least one order of magnitude less time on high-dimensional data.
In this paper:
- We propose MARIGOLD, an enhanced Lloyd’s algorithm for high-dimensional data.
- We enhange Lloyds by introducing triangle inequality pruning and a novel stepwise method that approximates the centroid’s distances through compression.
- We outperform existing Lloyd’s variants by up to an order of magnitude time on high-dimensional data.
- We showcase our algorithm on an interesting physics application on Angle-Resolved Photoemission Spectroscopy (ARPES) in which the input are high-dimensional spectra of a solid’s electronic structure.
Share this post
Reddit
LinkedIn
Email