I recently used the K-means algorithm to cluster the colors I had in a picture to create a, what I think is, pretty cool profile picture. In this post, I will show you how I managed to do that.

What you will need is a not too large picture (otherwise the algorithm will take much too long) and Octave installed which is available on pretty much any Linux distributions.

The code is available on GitHub with comments and line by line explanations.

I will split the process of running K-means in two different functions:

`initCentroids`

will initialize random centroids from pixels in the image`runKMeans`

will actually run the K-means algorithm

If youâ€™re familiar with the K-means algorithm you know that the algorithm is
based on two steps and so `runKMeans`

will be calling two different
functions:

`assignCentroids`

which will assign the closest centroid to each example`computeCentroids`

which will compute new centroids from the assignments done in`assignCentroids`

As previously mentioned, initCentroids will pick K (here representing the resulting number of colors we want) random examples from our dataset and choose them as our initial centroids.

It is defined as follows:

assignCentroids loops over every example in the dataset and associates the closest centroid to each and everyone of them.

computeCentroids
recomputes the centroids as the mean of the points which were assigned to each
centroid previously thanks to `assignCentroids`

.

runKMeans
will call `assignCentroids`

and `computeCentroids`

successively a fixed number
of times (thanks to the `iter`

parameter) and will use the initial centroids
computed by `initCentroids`

as a starting point.

In order to tie everything together I wrote a simple script which runs the algorithm and saves the compressed image. It is defined as follows:

You can also find this script on GitHub.

As you may have noticed, this script takes the following command line arguments:

- the path to the original image
- the path to the compressed image to write to
- the number of classes/colors wanted
- the number of iterations to perform

As an example, this is what I originally had as a picture:

I used the script like so:

To get back the following image:

Be careful though because, as you might have caught on, the complexity of the
whole algorithm is `O(i * np * K)`

where:

- i is the number of iterations to perform
- np is the number of pixels (so the length * width of your picture)
- K the number of classes/colors wanted

So if you have a very large image, or, alternatively if you want a large number of colors or want to run the algorithm for many iterations, the script may run for quite a while.

To give you an idea, I timed how much time my example took (I had a 535 * 535 image and I wanted 8 colors and ran the algorithm for 10 iterations):

gives me back approximately 9 minutes and 45 seconds.

I hope this post was a pretty good and fun introduction to the K-means algorithm. Do not hesitate to reach me if you encounter difficulties running or understanding the code.