Parallelizing AdaBoost on Multi Core Machines using open MP in C++
AdaBoost, short for Adaptive Boosting, is a type of boosting algorithm which combines several weak classifiers to create one strong classifier. AdaBoosts fundamental nature doesn’t allow for parallelizing finding the weak classifiers, we present a way which helps achieve nearly 22.14x times the speedup compared to a serial implementaiton. In this project, we develop a parallel AdaBoost algorithm that exploits the multiple cores in a CPU via light weight threads. We propose different algorithms for different types of datasets and machines.
Python3: To generate the data set for experimentation
C++ with OpenMP
Refer this for learning more about open mp and multi threading with C++. https://bisqwit.iki.fi/story/howto/openmp/
Run c++/create_data.sh to create the data set.
import the implimentation you like to use
There are 2 header files (details in report) which you can use:
Parallization to find the best feature threhold parallel: adaboost.h
Parallization everywhere: adaboost_best.h
To import simply type: #include "adaboost_best.h" Fit function: clf.fit(X,labels,t); Predict function: vector<int> predictions = clf.predict(X); X here is a vector of vectors of dimention n*m, where n is number of examples and m is number of dimentions.
We also time different transposse implimentations in c++/time_transpose.cpp
We also have a python implimentation in final-adaboost.ipynb
Benchmark of Implimentations:
Project Report parallelizing-adaboost.pdf
Scope for improvement:
Change the naive formula used in error rate to the optimized one (with weight rescaling) mentioned in MIT video