`rsparse`

is an R package for statistical learning primarily on **sparse matrices** - **matrix factorizations, factorization machines, out-of-core regression**. Many of the implemented algorithms are particularly useful for **recommender systems** and **NLP**.

On top of that we provide some optimized routines to work on sparse matrices - multithreaded <dense, sparse> matrix multiplications and improved support for sparse matrices in CSR format (`Matrix::RsparseMatrix`

).

We've paid some attention to the implementation details - we try to avoid data copies, utilize multiple threads via OpenMP and use SIMD where appropriate. Package **allows to work on datasets with millions of rows and millions of columns**.

### Support

Please reach us if you need **commercial support** - hello@rexy.ai.

# Features

### Classification/Regression

- Follow the proximally-regularized leader which llows to solve
**very large linear/logistic regression**problems with elastic-net penalty. Solver use with stochastic gradient descend with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See Ad Click Prediction: a View from the Trenches for more examples.- Only logistic regerssion implemented at the moment
- Native format for matrices is CSR -
`Matrix::RsparseMatrix`

. However common R`Matrix::CpasrseMatrix`

(`dgCMatrix`

) will be converted automatically.

- Factorization Machines supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.

### Matrix Factorizations

- Vanilla
**Maximum Margin Matrix Factorization**- classic approch for "rating" prediction. See`WRMF`

class and constructor option`feedback = "explicit"`

. Original paper which indroduced MMMF could be found here. **Weighted Regularized Matrix Factorization (WRMF)**from Collaborative Filtering for Implicit Feedback Datasets. See`WRMF`

class and constructor option`feedback = "implicit"`

. We provide 2 solvers:- Exact based of Cholesky Factorization
- Approximated based on fixed number of steps of
**Conjugate Gradient**. See details in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering and Faster Implicit Matrix Factorization.

**Linear-Flow**from Practical Linear Models for Large-Scale One-Class Collaborative Filtering. Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to SLIM)- Fast
**Truncated SVD**and**Truncated Soft-SVD**via Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Works for both sparse and dense matrices. Works on float matrices as well! For certain problems may be even faster than irlba package. **Soft-Impute**via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.**GloVe**as described in GloVe: Global Vectors for Word Representation.- This is usually used to train word embeddings, but actually also very useful for recommender systems.

### Optimized matrix operations

- multithreaded
`%*%`

and`tcrossprod()`

for`<dgRMatrix, matrix>`

- multithreaded
`%*%`

and`crossprod()`

for`<matrix, dgCMatrix>`

- natively slice
`CSR`

matrices (`Matrix::RsparseMatrix`

) without converting them to triplet / CSC

# Installation

Most of the algorithms benefit from OpenMP and many of them could utilize high-performance implementation of BLAS. If you want make maximum out of the package please read the section below carefuly.

It is recommended to:

- Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).
- Add proper compiler optimizations in your
`~/.R/Makevars`

. For example on recent processors (with AVX support) and complier with OpenMP support following lines could be a good option:`CXX11FLAGS += -O3 -march=native -mavx -fopenmp -ffast-math CXXFLAGS += -O3 -march=native -mavx -fopenmp -ffast-math`

If you are on **Mac** follow instructions here. After installation of `clang4`

additionally put `PKG_CXXFLAGS += -DARMA_USE_OPENMP`

line to your `~/.R/Makevars`

. After that install `rsparse`

in a usual way.

# Materials

**Note that syntax is these posts/slides is not up to date since package was under active development**

- Slides from DataFest Tbilisi(2017-11-16)
- Introduction to matrix factorization with Weighted-ALS algorithm - collaborative filtering for implicit feedback datasets.
- Music recommendations using LastFM-360K dataset
- evaluation metrics for ranking
- setting up proper cross-validation
- possible issues with nested parallelism and thread contention
- making recommendations for new users
- complimentary item-to-item recommendations

- Benchmark against other good implementations

Here is example of `rsparse::WRMF`

on lastfm360k dataset in comparison with other good implementations:

# API

We follow mlapi conventions.

# Configure

Generate configure:

`autoconf configure.ac > configure && chmod +x configure`