# Our ECCV2014 work "ConceptMap: Mining noisy web data for concept learning"

---- I am living the joy of seeing my paper title on the list of accepted ECCV14 papers :). Seeing the outcome of your work makes worthwhile all your day to night efforts, REALLY!!!. Before start, I shall thank to my supervisor Pinar Duygulu for her great guidance.----

In this post, I would like to summarize the title work since I believe sometimes a friendly blog post might be more expressive than a solid scientific article.

"ConceptMap: Mining noisy web data for concept learning" proposes a pipeline so as to learn wide range of visual concepts by only defining a query to a image search engine. The idea is to query a concept at the service and download a huge bunch of images. Cluster images as removing the irrelevant instances. Learn a model from each of the clusters. At the end, each concept is represented by the ensemble of these classifiers. Continue reading Our ECCV2014 work "ConceptMap: Mining noisy web data for concept learning"

# How does Feature Extraction work on Images?

Here I share enhanced version of one of my Quora answer to a similar question ...

There is no single answer for this question since there are many diverse set of methods to extract feature from an image.

First, what is called feature? "a distinctive attribute or aspect of something." so the thing is to have some set of values for a particular instance that diverse that instance from the counterparts. In the field of images, features might be raw pixels for simple problems like digit recognition of well-known Mnist dataset. However, in natural images, usage of simple image pixels are not descriptive enough. Instead there are two main steam to follow. One is to use hand engineered feature extraction methods (e.g. SIFT, VLAD, HOG, GIST, LBP) and the another stream is to learn features that are discriminative in the given context (i.e. Sparse Coding, Auto Encoders, Restricted Boltzmann Machines, PCA, ICA, K-means). Note that second alternative, Continue reading How does Feature Extraction work on Images?

# Kohonen Learning Procedure K-Means vs Lloyd's K-means

K-means maybe the most common data quantization method, used widely for many different domain of problems. Even it relies on very simple idea, it proposes satisfying results in a computationally efficient environment.

Underneath of the formula of K-means optimization, the objective is to minimize the distance between data points to its closest centroid (cluster center). Here we can write the objective as;

$argmin sum_{i=1}^{k}sum_{x_j in S_i} ||x_j - mu_i||^2$

$mu_i$ is the closest centroid to instance $x_j$.

# Some Useful Machine Learning Libraries.

Especially, with the advent of many different and intricate Machine Learning algorithms, it is very hard to come up with your code to any problem. Therefore, the use of a library and its choice is imperative provision before you start the project. However, there are many different libraries having different quirks and rigs in different languages, even in multiple languages so that choice is not very straight forward as it seems.

Before you start, I strongly recommend you to experiment the library of your interest so as not to say " Ohh Buda!" at the end. For being a simple guide, I will point some possible libraries and signify some of them as my choices with the reason behind.

# Share Research Data Sets via Torrent

I recognized a newbie but very bright idea today. The idea is to share academic data sets and papers via torrent. Especially, if you are working on big scale of data sets like ImageNet , having such a distributed approach is just delighting (albeit it presently does not include ImageNet) because in many cases of downloads, in the course of time your download speed starts to attenuate a very small values, even with additional download peers it gets worse. However, in such a torrent based system, it is on the other way around. If you are familiar to bit-torrent, you well know that as the data is distributed to many machines, you experienced faster download speed owing to the nature of torrent system.