# Collaborative Filtering

## Recommender System

• Automated systems that make recommendations based on the preference of users

• Examples:

• Amazon or any other online store always makes recommendations of products to buy
• Netflix or Spotify

### Collaborative Filtering

• is the tech behind most recommender systems, which is the process of filtering information by soliciting judgements from others to overcome the information overload problem.

### Collaborative Filtering Techniques

• Collaborative Filtering Techniques are used to predict how well a user will like an item that he/she has not rated given a set of historical preference judgements for a community of users.

#### Nearest Neighbor

• For Nearest Neighbor we need to define the similarity measure and neighborhood size

User-Based Nearest Neighbor (we’ve seen this many times before!)

• The process is given a user u, generate a prediction for an item i by using the ratings for i from users in u’s neighborhood
• Where Neighbor is equal to users with similar interests

[“Insert Equation for User-Based Nearest Neighbor]

Item-Based Nearest Neighbour

• Given a user u, generate a prediction for an item i by using a weighted sum of the users u’s rating for items that are most similar to i.

[“Insert Equation for Item-Based Nearest Neighbor]

#### Similarity Measure

• Numerical measure of how alike two data instances are, the higher they’re the more alike the instances are

Examples of Similarity Measures include:

• Jaccard Similarity
• Cosine Similarity
• Correlation Similarity
• Gaussian RBF Similarity
##### Jaccard Similarity

[“Insert more Notes Here”]

##### Cosine Similarity

[“Insert more Notes Here”]

##### Correlation Similarity

[“Insert more Notes Here”]

##### Gaussian RBF Similarity

[“Insert more Notes Here”]

### Python Example of User-Based Similarity

``````import pandas as pd

data
``````

Output:

Mission Impossible Over the Hedge Back to the Future Harry Potter
0 5 3 4 NaN
1 5 4 5 5.0
2 2 2 4 5.0
3 3 1 1 2.0

Using from the sklearn libary!

``````from sklearn.metrics import pairwise
import pandas as pd
import numpy as np

X = data.to_numpy()
user_similarity = pairwise.rbf_kernel(X[:,:3],gamma=0.2)
usim = pd.DataFrame(user_similarity)
usim
``````

Output:

0 1 2 3
0 1.000000 0.670320 0.135335 0.033373
1 0.670320 1.000000 0.060810 0.003028
2 0.135335 0.060810 1.000000 0.110803
3 0.033373 0.003028 0.110803 1.000000
``````avg_ratings = data.mean(axis=1)      # average ratings for each user
avg_ratings
``````

Output:

0 4.00 1 4.75 2 3.25 3 1.75 dtype: float64

``````import numpy as np

ratings = (data['Harry Potter'][1:] - avg_ratings[1:])*usim[0].iloc[1:]
predicted = avg_ratings[0] + (ratings.sum()*1.0/usim[0].iloc[1:].sum())
predicted
``````

Output:

4.4919499466890604

### Python Example of Item-Based Similarity

``````item_similarity = pairwise.rbf_kernel(X[1:,:].T,gamma=0.2)
isim = pd.DataFrame(item_similarity)
isim
``````

Output:

0 1 2 3
0 1.000000 0.367879 0.201897 0.135335
1 0.367879 1.000000 0.367879 0.110803
2 0.201897 0.367879 1.000000 0.670320
3 0.135335 0.110803 0.670320 1.000000
``````import numpy as np

ratings = data.iloc[0][:3].to_numpy()
simval = isim[3][:3]
prediction = (simval*ratings).sum()/simval.sum()
prediction
``````

Output:

4.026768397265431

## Another Technique: Matrix Factorization

• Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.