Lets say we need to find the similarity between people based on how they rank different items. Based on the that collected data we need to recommend some new
items to the users.People who are similar to each other have a higher likelihood of 
liking things that we recommend them.
Now lets consider a Set of Users.A Set of Items and a set of Rankings which users give to these Items.
Given this information how can we find the similarity between 2 Users.
Approach 1)
------------
Take user A and user B
Calculate union of items chosen by user A & B , call it A_UNION_B
Calculate the intersection of the items chosen between A & b, call it A_INT_B
then the value A_INT_B/A_UNION_B represents the similarity of the 2 users A and B
Approach 2)
------------
For every item that is common between A & B we can calculate the Euclidean distance between the rankings of the 2 items. 
D = RA (ranking A gave to an item) - RB (ranking B gave to an item)
Sum = Sum + D^2
The once we have the sum we can call it Similarity Sum and use a formula
like
Similarity = 1/(1 + Sum)
The 1 is added to prevent the 1/0 off case, where our magic factor factor is 1
we can choose an arbitrary magic factor say N
Then Similarity = N/(N + Sum)
Approach 3)
-----------
Use the Sum that we got in Approach 2 above 
Similarity = Sum / N where N is the number of items common between A and B that have the same rating
Similarity = Sqrt(Similarity)
then to make Similarity fall between the range 0 and 1
Similarity = 1 - tanh(Similarity)
Approach 4)
-----------
Use the Similarity that we calculated in Approach 3
NN = Number of items common between A and B
Similarity = Similarity * (N / NN)
Sunday, August 30, 2009
Subscribe to:
Post Comments (Atom)
Labels
. linux
(1)
algorithm
(15)
analytics
(1)
bash
(2)
bigoh
(1)
bruteforce
(1)
c#
(1)
c++
(40)
collections
(1)
commands
(2)
const
(1)
cosine similarity
(1)
creating projects
(1)
daemon
(1)
device_drivers
(1)
eclipse
(6)
eclipse-plugin-development
(9)
equals
(1)
formatting
(1)
freebsd
(1)
game programming
(1)
hashcode
(1)
heap
(1)
heaps
(1)
immutable-objects
(1)
java
(19)
JDT
(1)
kernel
(1)
linux
(4)
little sugar
(23)
logging
(1)
machine learning
(1)
marker-resolution
(1)
markers
(1)
mergesort
(1)
mixins
(1)
numbers
(1)
opengl
(2)
patterns
(2)
priority-queue
(1)
programming
(51)
ps
(1)
ranking
(1)
refactoring
(3)
references
(1)
security
(1)
set
(1)
shell
(1)
similarity
(1)
statistics
(1)
stl
(1)
tetris
(1)
threads
(1)
trees
(2)
unicode
(1)
unix
(2)
views
(2)
windows programming
(2)
XNA
(1)
 
 
No comments:
Post a Comment