Here I share enhanced version of one of my Quora answer to a similar question ...
There is no single answer for this question since there are many diverse set of methods to extract feature from an image.
First, what is called feature? "a distinctive attribute or aspect of something." so the thing is to have some set of values for a particular instance that diverse that instance from the counterparts. In the field of images, features might be raw pixels for simple problems like digit recognition of well-known Mnist dataset. However, in natural images, usage of simple image pixels are not descriptive enough. Instead there are two main steam to follow. One is to use hand engineered feature extraction methods (e.g. SIFT, VLAD, HOG, GIST, LBP) and the another stream is to learn features that are discriminative in the given context (i.e. Sparse Coding, Auto Encoders, Restricted Boltzmann Machines, PCA, ICA, K-means). Note that second alternative, Continue reading How does Feature Extraction work on Images?
K-means maybe the most common data quantization method, used widely for many different domain of problems. Even it relies on very simple idea, it proposes satisfying results in a computationally efficient environment.
Underneath of the formula of K-means optimization, the objective is to minimize the distance between data points to its closest centroid (cluster center). Here we can write the objective as;
is the closest centroid to instance .
Continue reading Kohonen Learning Procedure K-Means vs Lloyd's K-means
Especially, with the advent of many different and intricate Machine Learning algorithms, it is very hard to come up with your code to any problem. Therefore, the use of a library and its choice is imperative provision before you start the project. However, there are many different libraries having different quirks and rigs in different languages, even in multiple languages so that choice is not very straight forward as it seems.
Before you start, I strongly recommend you to experiment the library of your interest so as not to say " Ohh Buda!" at the end. For being a simple guide, I will point some possible libraries and signify some of them as my choices with the reason behind.
I recognized a newbie but very bright idea today. The idea is to share academic data sets and papers via torrent. Especially, if you are working on big scale of data sets like ImageNet , having such a distributed approach is just delighting (albeit it presently does not include ImageNet) because in many cases of downloads, in the course of time your download speed starts to attenuate a very small values, even with additional download peers it gets worse. However, in such a torrent based system, it is on the other way around. If you are familiar to bit-torrent, you well know that as the data is distributed to many machines, you experienced faster download speed owing to the nature of torrent system.