TechJude: Similarity in application features

Saturday, January 30, 2010

Similarity in application features - Cosine Similarity

Lets say you are planning to write a rating system for music songs.

and you are interested in the following queries:
- average rating of a given song
- average rating a user gives to a song
- which songs are most similar
- which users rate similar songs
- which users have similar tastes

here are we gave 2 dimensions user and song rating. We can use cosine similarity to answer the above mentioned queries. Lets do this in a step by step manner

Step 1
Create a 2 dimensional matrix of song_rating vs user

	A	B	C	Avg
song1	3	2	1	2
song2	4	2	3	3
song3	2	4	5	11/3
average	3	8/3	3	26/3

Here the column represent the ratings given by users A, B, C
The last column represents the avg rating of a given song.
The rows represents a song and the ratings given by different users for that song.

Step 2
Normalize the values in the matrix so that they lie between 0 and 1.

divide each column value by square root of the sum of the squares of all the columns in a given row. e.g to normalize column with value 3 divide 3 by
sqrt(3^2 + 2^2 + 1^2)

	A	B	C
song1	3	2	1
song2	4	2	3
song3	2	4	5

Step 3
Normalize the values in the matrix so that they lie between 0 and 1.

divide each column value by square root of the sum of the squares of all the columns in a given row. e.g to normalize column with value 3 divide 3 by
sqrt(3^2 + 2^2 + 1^2)

	A	B	C
song1	3	2	1
song2	4	2	3
song3	2	4	5

Step 4
obtain a new table after applying normalization rule

	A	B	C
song1	.8018	.5345	.2673
song2	.7428	.3714	.557
song3	.2981	.5963	.7454

Now each row with every other row to obtain a similarity matrix.
The multiplication should be a dot product of the 2 rows

for example the song1 and song2 dot product yields in
.8018*.7428 + .5345*.3714 + .2673*.557 = .943

Step 5
Use the dot product rule to obtain the similarity rule

	Song1	Song2	Song3
song1	1	.943	.757
song2	.943	1	.858
song3	.757	.858	1

From this matrix its easy to figure out that song1 is more likely to be similar to song2 than song3

Step 6
Finding similar users is similar to finding similar songs, just align the
users as rows of the initial matrix and song rating as the columns of the initialization matrix.

Then apply Step 2 - Step 5 in order and you can obtain the similarity matrix for users also

1 comment:

rizwan said...: Thanks, it helped a lot.
Wondering if some song (say) song1 is to be suggested to a user (say) User C, how should we proceed after calculating this similarity matrix.; 5/29/2011 4:53 PM

TechJude

Saturday, January 30, 2010

Similarity in application features - Cosine Similarity

1 comment:

Labels

Pages

Popular Posts