Counting 3,834 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Parallelizing AdaBoost on Multi Core Machines using open MP in C++

AdaBoost, short for Adaptive Boosting, is a type of boosting algorithm which combines several weak classifiers to create one strong classifier. AdaBoosts fundamental nature doesn’t allow for parallelizing finding the weak classifiers, we present a way which helps achieve nearly 22.14x times the speedup compared to a serial implementaiton. In this project, we develop a parallel AdaBoost algorithm that exploits the multiple cores in a CPU via light weight threads. We propose different algorithms for different types of datasets and machines.


Python3: To generate the data set for experimentation

C++ with OpenMP

Refer this for learning more about open mp and multi threading with C++.


  1. Run c++/ to create the data set.

  2. import the implimentation you like to use

There are 2 header files (details in report) which you can use:

To import simply type:

#include "adaboost_best.h"

Fit function:,labels,t);

Predict function: 
vector<int> predictions = clf.predict(X); 

X here is a vector of vectors of dimention n*m, 
where n is number of examples and m is number of dimentions.

  1. We also time different transposse implimentations in c++/time_transpose.cpp

  2. We also have a python implimentation in final-adaboost.ipynb

Benchmark of Implimentations:

Project Report parallelizing-adaboost.pdf

Scope for improvement:

Change the naive formula used in error rate to the optimized one (with weight rescaling) mentioned in MIT video