I recently used the K-means algorithm to cluster the colors I had in a picture to create a, what I think is, pretty cool profile picture. In this post, I will show you how I managed to do that.
What you will need is a not too large picture (otherwise the algorithm will take much too long) and Octave installed which is available on pretty much any Linux distributions.
The code is available on GitHub with comments and line by line explanations.
I will split the process of running K-means in two different functions:
initCentroidswill initialize random centroids from pixels in the image
runKMeanswill actually run the K-means algorithm
If you’re familiar with the K-means algorithm you know that the algorithm is
based on two steps and so
runKMeans will be calling two different
assignCentroidswhich will assign the closest centroid to each example
computeCentroidswhich will compute new centroids from the assignments done in
As previously mentioned, initCentroids will pick K (here representing the resulting number of colors we want) random examples from our dataset and choose them as our initial centroids.
It is defined as follows:
assignCentroids loops over every example in the dataset and associates the closest centroid to each and everyone of them.
recomputes the centroids as the mean of the points which were assigned to each
centroid previously thanks to
computeCentroids successively a fixed number
of times (thanks to the
iter parameter) and will use the initial centroids
initCentroids as a starting point.
In order to tie everything together I wrote a simple script which runs the algorithm and saves the compressed image. It is defined as follows:
You can also find this script on GitHub.
As you may have noticed, this script takes the following command line arguments:
As an example, this is what I originally had as a picture:
I used the script like so:
To get back the following image:
Be careful though because, as you might have caught on, the complexity of the
whole algorithm is
O(i * np * K) where:
So if you have a very large image, or, alternatively if you want a large number of colors or want to run the algorithm for many iterations, the script may run for quite a while.
To give you an idea, I timed how much time my example took (I had a 535 * 535 image and I wanted 8 colors and ran the algorithm for 10 iterations):
gives me back approximately 9 minutes and 45 seconds.
I hope this post was a pretty good and fun introduction to the K-means algorithm. Do not hesitate to reach me if you encounter difficulties running or understanding the code.