![]() |
|
| | |
| Craig Holman Research Jabberwocky Quotations Contact | |
| Home >
Jabberwocky      |
|
Jabberwocky for May 29, 2008 |
|
The Netflix Prize - Corated movies revisited     I wrote yesterday about analyzing corated movies in order to identify clusters of movies. Some elaboration is called for. Two movies are corated by a group of customers if each customer in the group has rated both movies. For purposes of the following examples, let's assume that only eight movies have been rated and they are labeled 'A' through 'H'. We'll construct a 8 by 8 grid of counters. Each row is labeled with 'A' through 'H', as is each column. Each counter is initialized to 0. |
For each customer, for each pair of different movies that the customer has rated, increment the counter at the intersection of the two movies. For convenience, each counter at Row[i]Column[j] will be shown as duplicated at Row[j]Column[i].
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The counter values range from 0 through 7192. Pick a threshold value in this range. I picked 300 for this next example. For each counter in the grid, replace its value with 1 if the counter is greater than or equal to the threshold, or with 0 otherwise.
What we have constructed is the adjacency matrix for a graph, where the nodes of the graph are the movies 'A' through 'H' and there is an edge between a pair of nodes X and Y if at least 300 customers rated both movie X and movie Y. Our next task is to discover the maximal cliques of this graph. Using the maximal cliques algorithm that I described earlier, the following maximal cliques were discovered (represented as sets of nodes):
{ A C E } { A E G } { B D } { B E } { D H } { E F } { E H }
For each maximal clique, all of the movies in the clique were rated by the same group of at least 300 customers.
There are 28 distinct values in the non-diagonal counters in the populated counter grid. We can construct 28 different graphs using the same process that we used for a threshold of 300. From each graph, we can discover the set of maximal cliques for that graph. These 28 sets of maximal cliques can then be analyzed to discover movie clusters and to determine the degree of membership of each movie in each cluster. I'll explore how this can be accomplished later.      |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Home >
Jabberwocky      |
|
| Craig Holman Research Jabberwocky Quotations Contact | |
| Patterncraft and Constraint and Consequence are trademarks of Craig Stewart Holman Copyright © 2008 Craig Stewart Holman All Rights Reserved. |
|
| | |
![]() |
|