Understanding matrix factorization for recommendation (part 4) - algorithm implementation

Foreword: this is the last part of a 4 parts series. Here are parts 1, 2 and 3. This series is an extended version of a talk I gave at PyParis 17.

Algorithm implementation in Python

In the previous part, we have described how to find an approximate solution to the SVD problem using Stochastic Gradient Descent. Here are the main lines of this procedure:

  1. Randomly initialize all vectors $p_u$ and $q_i$, each of size 10.
  2. for a given number of times (i.e. number of epochs), repeat:
    • for all known ratings $r_{ui}$, repeat:
      • compute $\frac{\partial f_{ui}}{\partial p_u}$ and $\frac{\partial f_{ui}}{\partial q_i}$ (we just did)
      • update $p_u$ and $q_i$ with the following rule: \(p_u \leftarrow p_u + \alpha \cdot q_i (r_{ui} - p_u \cdot q_i)\), and \(q_i \leftarrow q_i + \alpha \cdot p_u (r_{ui} - p_u \cdot q_i)\). We avoided the multiplicative constant $2$ and merged it into the learning rate $\alpha$.

Without further ado, here is a Python translation of this algorithm:

def SGD(data):
    '''Learn the vectors p_u and q_i with SGD.
       data is a dataset containing all ratings + some useful info (e.g. number
       of items/users).
    '''

    n_factors = 10  # number of factors
    alpha = .01  # learning rate
    n_epochs = 10  # number of iteration of the SGD procedure

    # Randomly initialize the user and item factors.
    p = np.random.normal(0, .1, (data.n_users, n_factors))
    q = np.random.normal(0, .1, (data.n_items, n_factors))

    # Optimization procedure
    for _ in range(n_epochs):
        for u, i, r_ui in data.all_ratings():
            err = r_ui - np.dot(p[u], q[i])
            # Update vectors p_u and q_i
            p[u] += alpha * err * q[i]
            q[i] += alpha * err * p[u]

This fits into less than 10 lines of code. Pretty neat, right?

Once we have run the SGD procedure, all vectors $p_u$ and $q_i$ will be estimated. We can then predict all the ratings we want by simply using the dot product between the vectors $p_u$ and $q_i$:

def estimate(u, i):
    '''Estimate rating of user u for item i.'''
    return np.dot(p[u], q[i])

And that’s it, we’re done :). Want to try it out yourself? I made a small notebook where we can actually run this algorithm on a real dataset. We will use the library surprise, which is a great library for quickly implementing rating prediction algorithms (but I might be slightly biased, as I’m the main dev of surprise!). To use it, you’ll simply need to install it first using pip:

pip install scikit-surprise

How good is our algorithm?

As you can see on the notebook, we obtain an average RMSE of about 0.98. RMSE stands for Root Mean Squared Error, and is computed as follows:

\[\text{RMSE} = \sqrt{\sum_{u,i} (\hat{r}_{ui} - r_{ui})^2}.\]

You can think of it as some sort of average error, where big errors are heavily penalized (because they are squared). Having an RMSE of 0 means that all our predictions are perfect. Having an RMSE of 0.5 means that in average, we are approximately 0.5 off with each prediction.

So is 0.98 a good RMSE? It’s actually really not that bad! As shown in the notebook, a neighborhood algorithm only achieves an RMSE of 1. A more sophisticated MF algorithm (the same as ours, except the ratings are unbiased and we use regularization) can achieve an RMSE of about 0.96. This algorithm is called ‘SVD’ in the literature, but you know now that it can’t be a real SVD, as there are missing ratings ;). It’s only (heavily) inspired by SVD.

Wrapping it up

All right, that’s it! I hope you now understand how beautiful PCA and SVD are, and how we can adapt SVD to a recommendation problem. If you want to play around with surprise, there are plenty of cool stuff in the docs.

If you found this series useful (or not), please let me know in the comments. Any mistakes are my own, and any kind of criticism would be greatly appreciated!

Thanks to Pierre Poulain for the valuable feedback!

Resources, going further

I tried to avoid theoretical considerations in this series, and rather give visual, concrete examples to illustrate the use of PCA and SVD. But the theoretical side should not be overlooked, and it is actually very interesting. Here are a few resources that you may want to read if you want to dive further into the various topics covered so far:

Nicolas Hug

Nicolas Hug

ML Engineer

comments powered by Disqus
rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora