rsparse is an R package for statistical learning on sparse data. Notably it implements many algorithms sparse matrix factorizations with a focus on applications for recommender systems.
All of the algorithms benefit from OpenMP and most of them use BLAS. Package scales nicely to datasets with millions of rows and millions of columns.
In order to get maximum of performance it is recommended to:
- Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).
- Add proper compiler optimizations in your
~/.R/Makevars. For example on recent processors (with AVX support) and complier with OpenMP support following lines could be a good option:
If you are on Mac follow instructions here. After installation of
CXX11FLAGS += -O3 -march=native -mavx -fopenmp -ffast-math CXXFLAGS += -O3 -march=native -mavx -fopenmp -ffast-math
PKG_CXXFLAGS += -DARMA_USE_OPENMPline to your
~/.R/Makevars. After that install
rsparsein the usual way.
- Follow the proximally-regularized leader which llows to solve very large linear/logistic regression problems with elastic-net penalty. Solver use with stochastic gradient descend with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See Ad Click Prediction: a View from the Trenches for more examples.
- Only logistic regerssion implemented at the moment
- Native format for matrices is CSR -
Matrix::RsparseMatrix. However common R
dgCMatrix) will be converted automatically.
- Factorization Machines supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.
- Vanilla Maximum Margin Matrix Factorization - classic approch for "rating" prediction. See
WRMFclass and constructor option
feedback = "explicit". Original paper which indroduced MMMF could be found here.
- Weighted Regularized Matrix Factorization (WRMF) from Collaborative Filtering for Implicit Feedback Datasets. See
WRMFclass and constructor option
feedback = "implicit". We provide 2 solvers:
- Exact based of Cholesky Factorization
- Approximated based on fixed number of steps of Conjugate Gradient. See details in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering and Faster Implicit Matrix Factorization.
- Linear-Flow from Practical Linear Models for Large-Scale One-Class Collaborative Filtering. Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to SLIM)
- Fast Truncated SVD and Truncated Soft-SVD via Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Work nice for sparse and dense matrices. Usually it is even faster than irlba package.
- Soft-Impute via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.
Here is example of
rsparse::WRMF on lastfm360k dataset in comparison with other good implementations:
Note that syntax could be not up to date since package is under active development
- Slides from DataFest Tbilisi(2017-11-16)
- Introduction to matrix factorization with Weighted-ALS algorithm - collaborative filtering for implicit feedback datasets.
- Music recommendations using LastFM-360K dataset
- evaluation metrics for ranking
- setting up proper cross-validation
- possible issues with nested parallelism and thread contention
- making recommendations for new users
- complimentary item-to-item recommendations
- Benchmark against other good implementations
We follow mlapi conventions.