and you are interested in the following queries:
- average rating of a given song
- average rating a user gives to a song
- which songs are most similar
- which users rate similar songs
- which users have similar tastes
here are we gave 2 dimensions user and song rating. We can use cosine similarity to answer the above mentioned queries. Lets do this in a step by step manner
Step 1
Create a 2 dimensional matrix of song_rating vs user
A | B | C | Avg | |
---|---|---|---|---|
song1 | 3 | 2 | 1 | 2 |
song2 | 4 | 2 | 3 | 3 |
song3 | 2 | 4 | 5 | 11/3 |
average | 3 | 8/3 | 3 | 26/3 |
Here the column represent the ratings given by users A, B, C
The last column represents the avg rating of a given song.
The rows represents a song and the ratings given by different users for that song.
Step 2
Normalize the values in the matrix so that they lie between 0 and 1.
divide each column value by square root of the sum of the squares of all the columns in a given row. e.g to normalize column with value 3 divide 3 by
sqrt(3^2 + 2^2 + 1^2)
A | B | C | |
---|---|---|---|
song1 | 3 | 2 | 1 |
song2 | 4 | 2 | 3 |
song3 | 2 | 4 | 5 |
Step 3
Normalize the values in the matrix so that they lie between 0 and 1.
divide each column value by square root of the sum of the squares of all the columns in a given row. e.g to normalize column with value 3 divide 3 by
sqrt(3^2 + 2^2 + 1^2)
A | B | C | |
---|---|---|---|
song1 | 3 | 2 | 1 |
song2 | 4 | 2 | 3 |
song3 | 2 | 4 | 5 |
Step 4
obtain a new table after applying normalization rule
A | B | C | |
---|---|---|---|
song1 | .8018 | .5345 | .2673 |
song2 | .7428 | .3714 | .557 |
song3 | .2981 | .5963 | .7454 |
Now each row with every other row to obtain a similarity matrix.
The multiplication should be a dot product of the 2 rows
for example the song1 and song2 dot product yields in
.8018*.7428 + .5345*.3714 + .2673*.557 = .943
Step 5
Use the dot product rule to obtain the similarity rule
Song1 | Song2 | Song3 | |
---|---|---|---|
song1 | 1 | .943 | .757 |
song2 | .943 | 1 | .858 |
song3 | .757 | .858 | 1 |
From this matrix its easy to figure out that song1 is more likely to be similar to song2 than song3
Step 6
Finding similar users is similar to finding similar songs, just align the
users as rows of the initial matrix and song rating as the columns of the initialization matrix.
Then apply Step 2 - Step 5 in order and you can obtain the similarity matrix for users also
1 comment:
Thanks, it helped a lot.
Wondering if some song (say) song1 is to be suggested to a user (say) User C, how should we proceed after calculating this similarity matrix.
Post a Comment