Tag Archives: tutorial

Simple Parallel Processing in Python

Here is a very concise view of Python multiprocessing module and its benefits. It is certainly important module for large scale data mining and machine learning projects and Kaggle like challenges. Therefore take a brief look to that slide to discover how to up-up your project cycle.

For more info refer to :

Getting started to Thrust on source code...

Here I am sharing the code that I write while learning basics of the Thrust. It is self explanatory with its qualified comments.

You might just download the code here from my dropbox.

This code is crafted as I am learning THRUST Library and utilizing its great benefits with little effort on CUDA complexity. You might choose to download the code since I am so lazy to keep the code aligned below as it is pretty long. ūüôĀ

Anomaly detection and a simple algorithm with probabilistic approach.

What is anomaly detection? It is the way of detecting a outlier data point among the other points that have a some kind of logical distribution.  Outlier one is also anomalous point (Figure 1)

Figure 1

What are the applications?

  • ¬†Fraud user activity detection - it is a way of detecting hacker activities on web applications or network connections by considering varying attributes of the present status. For example , an application can keep track of the user's inputs to website and the work load that he proposes to system. Considering these current¬†attribute¬†values¬†detection¬†system decide a¬†particular¬†fraud action and kick out the user if there is.
  • Data center monitoring¬†- You might be governing a data center with vast¬†amount¬†of computers so it is really hard to check each computer¬†regularly¬†for any flaw. A¬†anomaly¬†detection system might be working by considering network connection parameters of the computers, CPU and Memory Loads, it detect any problem on computer. Continue reading Anomaly detection and a simple algorithm with probabilistic approach.

Some Basic Machine Learning Terms #1

If you are working on some project related to machine learning (ML) or you are a newbie researcher knowing these terms and definitions might be useful.

Machine Learning: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E -- Tom M. Mitchell Continue reading Some Basic Machine Learning Terms #1

What is "long long" type in c++?

long long is not the same as long (although they can have the same size, e.g. in most 64-bit POSIX system). It is just guaranteed that a long long is at least as long as a long. In most platforms, a long long represents a 64-bit signed integer type.

You could use long long to store the 8-byte value safely in most conventional platforms, but it's better to use int64_t/int_least64_t from <stdint.h>/<cstdint> to clarify that you want an integer type having ‚Č•64-bit.

GDB for debug... (Multi-threaded program)

I am recently working on a project for my school assignment and I discover the help of the GDB debugger. (Little late but whatever...(at 3rd GRADE)) :).

I used it to discover problems of my multi-threaded code that uses also Semaphores. However, now I'll talk about GDB debugger not semaphore or thread. Time to begin...

First of all, in linux you need to get the GDB packages. You can easily use the software center .


Compile your source code as:
erogol@erogol-G50V:~$gcc -g [source_name] -o[execution_file_name]
(do not change the -g option deceleration since it can causes some silly errors).

Start GDB:

erogol@erogol-G50V:~$ gdb
GNU gdb (GDB) 7.2-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:

(gdb) [all the commands will be written at that location]

NOTE:if you start it into the folder that includes your executable program you do not need to go to data path again in gdb, but you can go to the folder just like in the main terminal screen, using "cd". If you are in the program folder then you can start gdb with the command "gdb [program_name]"

Open the program with gdb:

(gdb) file [program_name]

It will load the program.

Then: use the command as needed.

run - run the program

break ['line_no' or 'code on the line'] - put a breakpoint to the program. "breakpoint" is the place that you code will be stopped when you run it.

step - go step by step by pressing "ENTER" of the program after it stops on one of the break point.

help all - get the all commands and explanations.

(When program creates new thread gdb will warn you)

info threads - shows threads number and some additional info.

thread [thread_no] - switch the watching thread.

info locals - show all the local variables and the values.

thread apply all info locals - show all the local values of all the threads.

print [variable_name] - show the value of the variable

finish - it finishes the function execution of the current one.

continue - exit from the step mode and continue the execution.

Also when you press CTRL+C in the execution time, program will be paused not canceled.

These are the basic commands that are used for my project debug, for more info here is a good tutorial: