Kohonen Learning Procedure K-Means vs Lloyd's K-means

K-means maybe the most common data quantization method, used widely for many different domain of problems. Even it relies on very simple idea, it proposes satisfying results in a computationally efficient environment.

Underneath of the formula of K-means optimization, the objective is to minimize the distance between data points to its closest centroid (cluster center). Here we can write the objective as;

$argmin sum_{i=1}^{k}sum_{x_j in S_i} ||x_j - mu_i||^2$

$mu_i$ is the closest centroid to instance $x_j$.

One of the footnote for Lloyd's above K-means formula, it implicitly enforces similar size clusters, although it is ill-suited assumption for many of the data-sets. In addition, since we compute new centroids over the whole data for each iteration, there is almost nothing randomized so objective is intended to stay onto same optimal point of the objective with multiple successive runs..

In order to deface the levelled problems of K-means, one alternative approach is to use Kohonen's Learning Procedure (KLP) instead of mean quantization. It brings more direct optimization steps (delta rule), controlled by a learning rate alpha. It also conforms to randomized batch or a online (stochastic) learning better in relation to original K-means, proposing possible  improvements to same optimal point and same size cluster problems.  In that manner, It is prone to better mapping of the data-set over the clusters. In addition, from my observations, it is faster to reach a optimal point. It is very essential feature, especially for large scale problems.

KLP stochastic learning optimization is formulated as ;

$w_j^{t+1} = w_j^t + alpha (x_i - w_j^t)$

superscript $t$ is the iteration count, $w_j$ is the closest cluster centroid to given instance vector $x_i$.  The idea is very similar to Lloyd's, in essence. For a given instance, its closest centroid is found and weight vector of that centroid is updated toward the instance vector. For batch version, the formula is same but the update step is defined by the mean of the distances between the centroid and its matching instances. Basically, the difference here is to use the distance between instances and the present centroids, contrary to mean heuristic of Llyod.

KLP-Kmeans including three alternative implementations as Scipy, Theano, Theano-GPU, is at my Github Repository with simple demonstrations. Even Scipy version is faster for small scale problems as the scale of matrix operations is getting larger Theno contrives callous improvements. Furthermore, with increasing batch size GPU based Theano implementation should be the choice.