# rsparse

`rsparse`

is an R package for statistical learning on **sparse data**. Notably it implements many algorithms sparse **matrix factorizations** with a focus on applications for **recommender systems**.

**All of the algorithms benefit from OpenMP and most of them use BLAS**. Package scales nicely to datasets with millions of rows and millions of columns.

In order to get maximum of performance it is recommended to:

- Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).
- Add proper compiler optimizations in your
`~/.R/Makevars`

. For example on recent processors (with AVX support) and complier with OpenMP support following lines could be a good option:

If you are on`CXX11FLAGS += -O3 -march=native -mavx -fopenmp -ffast-math CXXFLAGS += -O3 -march=native -mavx -fopenmp -ffast-math`

**Mac**follow instructions here. After installation of`clang4`

additionally put`PKG_CXXFLAGS += -DARMA_USE_OPENMP`

line to your`~/.R/Makevars`

. After that install`rsparse`

in the usual way.

## Misc utils/methods

- multithreaded
`%*%`

and`tcrossprod()`

for`<dgRMatrix, matrix>`

- multithreaded
`%*%`

and`crossprod()`

for`<matrix, dgCMatrix>`

## Algorithms

### Classification/Regression

- Follow the proximally-regularized leader which llows to solve
**very large linear/logistic regression**problems with elastic-net penalty. Solver use with stochastic gradient descend with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See Ad Click Prediction: a View from the Trenches for more examples.- Only logistic regerssion implemented at the moment
- Native format for matrices is CSR -
`Matrix::RsparseMatrix`

. However common R`Matrix::CpasrseMatrix`

(`dgCMatrix`

) will be converted automatically.

- Factorization Machines supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.

### Matrix Factorizations

- Vanilla
**Maximum Margin Matrix Factorization**- classic approch for "rating" prediction. See`WRMF`

class and constructor option`feedback = "explicit"`

. Original paper which indroduced MMMF could be found here. **Weighted Regularized Matrix Factorization (WRMF)**from Collaborative Filtering for Implicit Feedback Datasets. See`WRMF`

class and constructor option`feedback = "implicit"`

. We provide 2 solvers:- Exact based of Cholesky Factorization
- Approximated based on fixed number of steps of
**Conjugate Gradient**. See details in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering and Faster Implicit Matrix Factorization.

**Linear-Flow**from Practical Linear Models for Large-Scale One-Class Collaborative Filtering. Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to SLIM)- Fast
**Truncated SVD**and**Truncated Soft-SVD**via Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Work nice for sparse and dense matrices. Usually it is even faster than irlba package. **Soft-Impute**via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.

## Efficiency

Here is example of `rsparse::WRMF`

on lastfm360k dataset in comparison with other good implementations:

# Materials

**Note that syntax could be not up to date since package is under active development**

- Slides from DataFest Tbilisi(2017-11-16)
- Introduction to matrix factorization with Weighted-ALS algorithm - collaborative filtering for implicit feedback datasets.
- Music recommendations using LastFM-360K dataset
- evaluation metrics for ranking
- setting up proper cross-validation
- possible issues with nested parallelism and thread contention
- making recommendations for new users
- complimentary item-to-item recommendations

- Benchmark against other good implementations

# API

We follow mlapi conventions.