Counting 2,129 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Author
Contributors
Last Commit
Dec. 11, 2017
Created
May. 10, 2017

reco

reco is an R package which implements many algorithms for sparse matrix factorizations. Focus is on applications for recommender systems.

Algorithms

  1. Vanilla Maximum Margin Matrix Factorization - classic approch for "rating" prediction. See WRMF class and constructor option feedback = "explicit". Original paper which indroduced MMMF could be found here.
  2. Weighted Regularized Matrix Factorization (WRMF) from Collaborative Filtering for Implicit Feedback Datasets. See WRMF class and constructor option feedback = "implicit". We provide 2 solvers:
    1. Exact based of Cholesky Factorization
    2. Approximated based on fixed number of steps of Conjugate Gradient. See details in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering and Faster Implicit Matrix Factorization.
  3. Linear-Flow from Practical Linear Models for Large-Scale One-Class Collaborative Filtering. Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to SLIM)
  4. Soft-SVD via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.
  5. Soft-Impute via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.
    • with a solution in SVD form

Efficiency

Package is reasonably fast and scales nicely to datasets with millions of rows and millions of columns:

  • built on top of RcppArmadillo
  • extensively uses BLAS and parallelized with OpenMP

Here is example of reco::WRMF on lastfm360k dataset in comparison with other good implementations:

Materials

Note that syntax could be not up to date since package is under active development

  1. Slides from DataFest Tbilisi(2017-11-16)
  2. Introduction to matrix factorization with Weighted-ALS algorithm - collaborative filtering for implicit feedback datasets.
  3. Music recommendations using LastFM-360K dataset
    • evaluation metrics for ranking
    • setting up proper cross-validation
    • possible issues with nested parallelism and thread contention
    • making recommendations for new users
    • complimentary item-to-item recommendations
  4. Benchmark against other good implementations

API

We follow mlapi conventions.

Notes on multithreading and BLAS

If you use multithreaded BLAS (you generally should) such as OpenBLAS, Intel MKL, Apple Accelerate, it is recommended to disable its internal multithreading ability (since thread contention can be easily slow down 10x and more). Matrix factorization is already parallelized in package with OpenMP.

At the moment reco tries to mitigate this issue automatically with the help of RhpcBLASctl. If you encounter any issues - please report to our [issue tracker]