Category Archives: Tips & Tricks

How to use Python Decorators

Decorators are handy sugars for Python programmers to shorten things and provides more concise programming.

For instance you can use decorators for user authentication for your REST API servers. Assume that, you need to auth. the user for before each REST calls. Instead of appending the same procedure to each call function, it is better to define decorator and tagging it onto your call functions.

Let's see the small example below. I hope it is self-descriptive.


"""
How to use Decorators:

Decorators are functions called by annotations
Annotations are the tags prefixed by @
"""

### Decorator functions ###
def helloSpace(target_func):
def new_func():
print "Hello Space!"
target_func()
return new_func

def helloCosmos(target_func):
def  new_func():
print "Hello Cosmos!"
target_func()
return new_func


@helloCosmos # annotation
@helloSpace # annotation
def hello():
print "Hello World!"

### Above code is equivalent to these lines
# hello = helloSpace(hello)
# hello = helloCosmos(hello)

### Let's Try
hello()

 

Fighting against class imbalance in a supervised ML problem.

ML on imbalanced data

given a imbalanced learning problem with a large class and a small class with number of instances N and M respectively;

  • cluster the larger class into M clusters and use cluster centers for training the model.
  • If it is a neural network or some compatible model. Cluster the the large class into K clusters and use these clusters as pseudo classes to train your model. This method is also useful for training your network with small number of classes case. It pushes your net to learn fine-detailed representations.
  • Divide large class into subsets with M instances then train multiple classifiers and use the ensemble.
  • Hard-mining is a solution which is unfortunately akin to over-fitting but yields good results in some particular cases such as object detection. The idea is to select the most confusing instances from the large set per iteration. Thus, select M most confusing instances from the large class and use for that iteration and repeat for the next iteration.
  • For specially batch learning, frequency based batch sampling might be useful. For each batch you can sample instances from the small class by the probability M/(M+N) and N/(M+N) for tha large class so taht you prioritize the small class instances for being the next batch. As you do data augmentation techniques like in CNN models, mostly repeating instances of small class is not a big problem.

Note for metrics, normal accuracy rate is not a good measure for suh problems since you see very high accuracy if your model just predicts the larger class for all the instances. Instead prefer ROC curve or keep watching Precision and Recall.

Please keep me updated if you know something more. Even, this is a very common issue in practice,  still hard to find a working solution.

devil in the implementation details

I was hassling with interesting problem lately. I trained a custom deep neural network model with ImageNet and ended up very good results at least on training logs.  I used Caffe for all these. Then, I ported my model to python interface and give some objects to it. Boommm!not working and even raised random prob values like it is not even trained for 4 days. It was really frustrating. After a dozens of hours I discovered that "Devil is in the details" .

I was using one of the Batch Normalization ("what is it ? "little intro here ) PR that is not merged to master branch but seems fine.  Then I found that interesting problem. The code in the branch computes each batch's mean by only looking at that batch. When we give only one example at test time, then the mean values are exactly the values of this particular image. This disables everything and the net starts to behave strangely. After a small search I found the solution which uses moving average instead of exact batch average. Now, I am at the stage of implementation. The puchcard is, do not use any PR which is not merged to master branch, that simple 🙂

Some Useful Machine Learning Libraries.

Especially, with the advent of many different and intricate Machine Learning algorithms, it is very hard to come up with your code to any problem. Therefore, the use of a library and its choice is imperative provision before you start the project. However, there are many different libraries having different quirks and rigs in different languages, even in multiple languages so that choice is not very straight forward as it seems.

Before you start, I strongly recommend you to experiment the library of your interest so as not to say " Ohh Buda!" at the end. For being a simple guide, I will point some possible libraries and signify some of them as my choices with the reason behind.

Continue reading Some Useful Machine Learning Libraries.

Sorting strings and Overriding std::sort comparison

At that post, I try to illustrate one of the use case of comparison overriding for std::sort on top of a simple problem. Our problem is as follows:

Write a method to sort an array of strings so that all the anagrams are next to each other.

Continue reading Sorting strings and Overriding std::sort comparison

Project Euler - Problem 14

Here is one again a very intricate problem from Project Euler. It has no solution sheet as oppose to the other problems at the site. Therefore there is no consensus on the best solution.

Below is the problem: (I really suggest you to observe some of the example sequences. It has really interesting behaviours. 🙂 )

The following iterative sequence is defined for the set of positive integers:

n → n/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence: Continue reading Project Euler - Problem 14

Project Euler - Problem 13

Here we have another qualified problem from Project Euler. You might want to work out the problem before see my solution.

The basic idea of my solution is to not use all the digits of the given numbers, instead extract the part of the each number that is necessary to sum up to conclude the first 10 digits of the result. I try to explain my approach at the top of the source code with my lacking MATH english. If you have any problem for that part please leave me a comment. Continue reading Project Euler - Problem 13

Project Euler - Problem 12

As a 2 years researcher, I feel a bit rusty to code.  I search a good set of execises to hone my abilities again and I stumbled upon Project Euler. This site hosts increasing number of very well formed algorithmic problems and discussions. It ranges very basic problems to very high level ones, requiring profound knowledge and practice.

After that intro. I want to introduce one of the example question from Project Euler. NOTE THAT, IF YOU ALREADY KNOW THE SITE AND YOU TRY TO SOLVE THAT PROBLEM, DO NOT CHEAT YOURSELF.

Here is the problem statement we try to solve. Continue reading Project Euler - Problem 12