# CTRmodel

CTR prediction model based on pure Spark MLlib, no third-party library.

# Realized Models

- Naive Bayes
- Logistic Regression
- Factorization Machine
- Random Forest
- Gradient Boosted Decision Tree
- GBDT + LR
- Neural Network
- Inner Product Neural Network (IPNN)
- Outer Product Neural Network (OPNN)

# Dataset

A small portion of some public ads database for test and initial debug.
You can directly get comparision among different models on metrics such as AUC under ROC and P-R curve.

**Data Format**

```
root
|-- user_id: integer (user id)
|-- item_id: integer (item id)
|-- category_id: integer (item category id)
|-- content_type: string (item content type)
|-- timestamp: string (timestamp)
|-- user_item_click: long (the number of user clicked the item)
|-- user_item_imp: double (the number of user watched the item)
|-- item_ctr: double (historical CTR of the item)
|-- is_new_user: integer (is the user a new user)
|-- user_embedding: array (embedding of the user)
| |-- element: double
|-- item_embedding: array (embedding of the item)
| |-- element: double
|-- label: integer (label of the sample 0-negative 1-positive)
```

# Usage

It's a maven project. Spark version is 2.3.0. Scala version is 2.11.

After dependencies are imported by maven automatically, you can simple run the example function (**com.ggstar.example.ModelSelection**) to train all the CTR models and get the metrics comparison among all the models.

# Related Papers on CTR prediction

- [GBDT+LR]Practical Lessons from Predicting Clicks on Ads at Facebook.pdf
- [FNN]Deep Learning over Multi-field Categorical Data.pdf
- [Multi-Task]An Overview of Multi-Task Learning in Deep Neural Networks.pdf
- [PNN]Product-based Neural Networks for User Response Prediction.pdf
- [Wide & Deep]Wide & Deep Learning for Recommender Systems.pdf
- [DeepFM]- A Factorization-Machine based Neural Network for CTR Prediction.pdf
- Deep Crossing- Web-Scale Modeling without Manually Crafted Combinatorial Features.pdf
- Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction.pdf
- Entire Space Multi-Task Model_ An Effective Approach for Estimating Post-Click Conversion Rate.pdf
- Deep Interest Network for Click-Through Rate Prediction.pdf
- Bid-aware Gradient Descent for Unbiased Learning with Censored Data in Display Advertising.pdf
- Ad Click Prediction a View from the Trenches.pdf
- Image Matters- Visually modeling user behaviors using Advanced Model Server.pdf
- Logistic Regression in Rare Events Data.pdf
- Deep & Cross Network for Ad Click Predictions.pdf
- Learning Deep Structured Semantic Models for Web Search using Clickthrough Data.pdf
- Adaptive Targeting for Online Advertisement.pdf