I recently used the K-means algorithm to cluster the colors I had in a picture to create a, what I think is, pretty cool profile picture. In this post, I will show you how I managed to do that.
What you will need is a not too large picture (otherwise the algorithm will take much too long) and Octave installed which is available on pretty much any Linux distributions.
The code is available on GitHub with comments and line by line explanations.
I will split the process of running K-means in two different functions:
initCentroids
will initialize random centroids from pixels in the imagerunKMeans
will actually run the K-means algorithmIf you’re familiar with the K-means algorithm you know that the algorithm is
based on two steps and so runKMeans
will be calling two different
functions:
assignCentroids
which will assign the closest centroid to each
examplecomputeCentroids
which will compute new centroids from the assignments
done in assignCentroids
As previously mentioned, initCentroids will pick K (here representing the resulting number of colors we want) random examples from our dataset and choose them as our initial centroids.
It is defined as follows:
assignCentroids loops over every example in the dataset and associates the closest centroid to each and everyone of them.
computeCentroids
recomputes the centroids as the mean of the points which were assigned to each
centroid previously thanks to assignCentroids
.
runKMeans
will call assignCentroids
and computeCentroids
successively a fixed number
of times (thanks to the iter
parameter) and will use the initial centroids
computed by initCentroids
as a starting point.
In order to tie everything together I wrote a simple script which runs the algorithm and saves the compressed image. It is defined as follows:
You can also find this script on GitHub.
As you may have noticed, this script takes the following command line arguments:
As an example, this is what I originally had as a picture:
I used the script like so:
To get back the following image:
Be careful though because, as you might have caught on, the complexity of the
whole algorithm is O(i * np * K)
where:
So if you have a very large image, or, alternatively if you want a large number of colors or want to run the algorithm for many iterations, the script may run for quite a while.
To give you an idea, I timed how much time my example took (I had a 535 * 535 image and I wanted 8 colors and ran the algorithm for 10 iterations):
gives me back approximately 9 minutes and 45 seconds.
I hope this post was a pretty good and fun introduction to the K-means algorithm. Do not hesitate to reach me if you encounter difficulties running or understanding the code.