Counting 3,463 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1


The Race to the Élysée Palace - Analysis of the 2017 French Presidential elections

In this project, we study and analyze the socio-demographic factors that influenced voters during French Presidential Elections. The presidential race has showcased a high level of political and ideological polarization of public opinion between two diverging views. Marine Le Pen, a far-right, anti-immigrant, anti-European Union candidate and an upstart former banker without political office experience, Emmanuel Macron. This study examines the 2017 Presidential French Elections primary vote for Emmanuel Macron & Marine Le Pen.


In this study, we collected, cleaned and aggregated demographics and economics data to infer the winning candidates in the French presidential elections at the level of each town in continental France.

In France, the major political parties are (1) National Front ( Front national: FN) lead by LePen (2) Socialist Party (Parti socialiste: PS) lead by HAMON / Holland (3) The Republicans (Les Républicains: LR) lead by FILLON / Sarkozy (3) Left Front (Front de gauche: FDG) lead by MELENCHON / MELENCHON. In the last presidential elections, the race was between France’s two presidential candidates, Marine Le Pen of the far-right National Front and the centrist former economy minister Emmanuel Macron. As Ms. Le Pen planned to stop immigration and leave the EU, what we found is that Ms. Le Pen was supported in regions with high unemployment and low incomes. Mr. Macron won in big cities, diverse and economically stable regions where most immigrants and educated people supported his plans.

In order to understand how France voted and why, we used demographic and economic data such: population, unemployement, average age at the level of every area, ratio of retired people, ratio of students, Gender ratios, people's prefession at the level of every region, ratio of immigrats and foreigners, ratio of educated people and their academic level, etc.

We use a logistic regression with L2 regularization. We tested the model using 5-fold cross validation for evaluation to prevent over-fitting. The best performing model has an overall precision 0.72, recall of 0.75, F1 score of 0.70 and Accuracy of 0.75. In order to understand the importance of predictors, we use Bagged decision trees like Random Forest and Extra Trees to estimate the importance of features. You can see below the table that describe the importance score of top 10 attributes.


Predictor Importance Score
Ratio (%) of people with higher education 0.0429
Ratio (%) of immigrants 0.0312
Population 0.0303
Ratio (%) of foreign-born women 0.0294
Ratio (%) of foreign-born men 0.0282
Ratio (%) of middle-class workers 0.0265
Ratio (%) of foreign-born over 55 0.0262
Ratio (%) of foreign-born women under 15 0.0261
Ratio (%) of foreign-born over 55 0.0254
Ratio (%) of unemployed women between 15 & 64 0.0246

Our findings indicate that socio-demographics & economic rationality were influential in people's voting behaviour during the last French presidential election .


Data Sources

You can find all the data along with a description in this file config/fr_open_data_sources.yml

How to run Code

  • Set up a virtual environment
virtualenv env
source env/bin/activate
  • Install the required modules list on requirements.txt.
pip install -r requirements.txt
  • Download socio-demographic data to perform the analysis
  • Clean, filter and join the different datasets
  • Classification task



If you are having issues, please let us know or submit a pull request.


The project is licensed under the MIT License.