4 of 33

Few common applications of recommendation systems

Product Recommendations	Movie Recommendations	News Articles
The most important use of recommendation systems is at online retailers Amazon or similar online vendors strive to present each returning user with some suggestions of products that they might like to buy These suggestions are not random, but are based on the purchasing decisions made by similar customers	Netflix offers its customers recommendations of movies they might like These recommendations are based on ratings provided by users in general	News services have attempted to identify articles of interest to readers, based on the articles that they have read in the past The similarity might be based on the similarity of important words in the documents, or on the articles that are read by people with similar reading tastes The same principles apply to recommending blogs from among the millions of blogs available, videos on YouTube, or other sites where content is provided regularly

5 of 33

Utility Matrix

In a recommendation-system application there are two classes of entities, ‘users’ and ‘items’
Users have preferences for certain items, and these preferences must be teased out of the data
The data is represented as a utility matrix, giving for each user-item pair, a value that represents what is known about the degree of preference of that user for that item
Values come from an ordered set, e.g., integers 1–5 representing the number of stars that the user gave as a rating for that item
One can assume that the matrix is sparse, meaning that most entries are “unknown”
An unknown rating implies that we have no explicit information about the user’s preference for the item

6 of 33

Utility Matrix (Contd..)

The given matrix represents users’ ratings of movies on a 1–5 scale, with 5 the highest rating
Blanks represent the situation where the user has not rated the movie
The movie names are HP1, HP2, and HP3 for Harry Potter I, II, and III, TW for Twilight, and SW1, SW2, and SW3 for Star Wars episodes 1, 2, and 3
The users are represented by capital letters A through D
Most user-movie pairs have blanks, meaning the user has not rated the movie
In practice, the matrix would be even sparser, with the typical user rating only a tiny fraction of all available movies

The goal of a recommendation system is to predict the blanks in the utility matrix
For example, would user A like SW2?
One may design a recommendation system to take into account properties of movies, such as their producer, director, stars, or even the similarity of their names
If so, one might then note the similarity between SW1 and SW2, and then conclude that since A did not like SW1, they were unlikely to enjoy SW2 either
Alternatively, with much more data, one might observe that the people who rated both SW1 and SW2 tended to give them similar ratings
Thus, we could conclude that A would also give SW2 a low rating, similar to A’s rating of SW1

7 of 33

Utility Matrix (Contd..)

Without a utility matrix, it is almost impossible to recommend items
However, acquiring data from which to build a utility matrix is often difficult
There are two general approaches to discovering the value users place on items

Approach 1	Approach 2
Users can be asked to rate items Movie ratings are generally obtained this way, and some online stores try to obtain ratings from their purchasers Sites providing content, such as some news sites or YouTube also ask users to rate items This approach is limited in its effectiveness, since generally users are unwilling to provide responses, and the information from those who do, may be biased by the very fact that it comes from people willing to provide ratings	One can make inferences from users’ behavior. Most obviously, if a user buys a product at Amazon, watches a movie on YouTube, or reads a news article, then the user can be said to “like” this item Note that this sort of rating system really has only one value: 1 means that the user likes the item Often, we find a utility matrix with this kind of data shown with 0’s rather than blanks where the user has not purchased or viewed the item However, in this case 0 is not a lower rating than 1; it is no rating at all More generally, one can infer interest from behavior other than purchasing For example, if an Amazon customer views information about an item, one can infer that they are interested in the item, even if they don’t buy it

8 of 33

Long tail

Physical delivery systems are characterized by a scarcity of resources
A physical store has limited shelf space, and can show the customer only a small fraction of all the choices that exist
On the other hand, online stores can make anything that exists available to the customer
Thus, a physical bookstore may have several thousand books on its shelves, but Amazon offers millions of books
Recommendation in the physical world is fairly simple, first, it is not possible to tailor the store to each individual customer
Thus, the choice of what is made available is governed only by the aggregate numbers
Typically, a bookstore will display only the books that are most popular, and a newspaper will print only the articles it believes the most people will be interested in

9 of 33

Long tail (Contd..)

The distinction between the physical and on-line worlds has been called the ‘long tail’ phenomenon
The vertical axis represents popularity (the number of times an item is chosen)
The items are ordered on the horizontal axis according to their popularity
Physical stores provide only the most popular items to the left of the vertical line, while the corresponding online sites provide the entire range of items:

The tail as well as the popular items

The long-tail phenomenon forces online sites to recommend items to individual users
It is not possible to present all available items to the user, the way physical stores can
Neither can we expect users to have heard of each of the items they might like

10 of 33

Content-Based systems*

Content-Based systems focus on properties of items

Similarity of items is determined by measuring the similarity in their properties

In a content-based system, a profile for each item needs to be constructed, which is a record or collection of records representing important characteristics of that item

In simple cases, the profile consists of some characteristics of the item that are easily discovered

*Not explicitly mentioned in syllabus, however there is no harm to study something beyond syllabus!

11 of 33

Content-Based systems

E.g.: Features of a movie that might be relevant to a recommendation system

Set of actors
Director
Year
Genre

Genre is a vaguer concept
However, movie reviews generally assign a genre from a set of commonly used terms
For example the Internet Movie Database (IMDB) assigns a genre or genres to every movie

12 of 33

Content-Based systems

Many other classes of items also allow us to obtain features from available data, even if that data must at some point be entered by hand
For instance, products often have descriptions written by the manufacturer, giving features relevant to that class of product (e.g., for a TV, it will be screen size, resolution, display technology like LED or LCD etc.)
Books have descriptions similar to those for movies, so we can obtain features such as author, year of publication, and genre
Music products such as CD’s and MP3 downloads have available features such as artist, composer, and genre
The ultimate goal for content-based recommendation is to create both an item profile consisting of feature-value pairs and a user profile summarizing the preferences of the user, based of their row of the utility matrix
P.S. There are some classes of items where it is not immediately apparent what the values of features should be!!

E.g. Documents

13 of 33

E.g.: Discovering Features of Documents

Documents do not tend to have readily available information giving features
A substitute that has been useful in practice is the identification of words that characterize the topic of a document
First, eliminate stop words – the several hundred most common words, which tend to say little about the topic of a document
For the remaining words, compute the TF-IDF score for each word in the document
The ones with the highest scores are the words that characterize the document

Tf-idf stands for term frequency-inverse document frequency
This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus
Tf-idf weight is composed by two terms:

The first computes the normalized Term Frequency (TF), aka. the number of times a word appears in a document, divided by the total number of words in that document;
The second term is the Inverse Document Frequency (IDF), computed as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears

14 of 33

E.g.: Discovering Features of Documents

Consider a document containing 100 words wherein the word cat appears 3 times
The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03
Now, assume we have 10 million documents and the word cat appears in one thousand of these. Then, the inverse document frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4
Thus, the Tf-idf weight is the product of these quantities: 0.03 * 4 = 0.12
The higher the TF-IDF score the more important or relevant the term is; as a term gets less relevant, its TF-IDF score will approach 0

15 of 33

Collaborative-Filtering System�

Collaborative-Filtering systems focus on the relationship between users and items
Similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items
Instead of using features of items to determine their similarity, the focus is on the similarity of the user ratings for two items
That is, in place of the item-profile vector for an item, we use its column in the utility matrix
Users are similar if their vectors are close according to some distance measure such as Jaccard or cosine distance
Recommendation for a user U is then made by looking at the users that are most similar to U in this sense, and recommending items that these users like
The process of identifying similar users and recommending what similar users like is called collaborative filtering

16 of 33

Ways of measuring similarity

The first question one must deal with is how to measure similarity of users or items from their rows or columns in the utility matrix
This data is too small to draw any reliable conclusions, but its small size will make clear some of the pitfalls in picking a distance measure
Observe specifically the users A and C
They rated two movies in common, but they appear to have almost diametrically opposite opinions of these movies
One would expect that a good distance measure would make them rather far apart

17 of 33

Alternative measures to consider�

Jaccard distance
A and B have an intersection of size 1 and a union of size 5
Thus, their Jaccard similarity is 1/5, and their Jaccard distance is 4/5; i.e., they are very far apart
In comparison, A and C have a Jaccard similarity of 2/4, so their Jaccard distance is the same, ½
Thus, A appears closer to C than to B
Yet that conclusion seems intuitively wrong
A and C disagree on the two movies they both watched, while A and B seem both to have liked the one movie they watched in common

18 of 33

Alternative measures to consider

Cosine Distance
Treating blank values as 0 (This can be misleading since it has the effect of treating the lack of a rating as more similar to disliking the movie than liking it)
The cosine of the angle between A and B is:

The cosine of the angle between A and C is:

Since a larger (positive) cosine implies a smaller angle and therefore a smaller distance, this measure tells us that A is slightly closer to B than to C

19 of 33

Alternative measures to consider

Rounding the Data
One could try to eliminate the apparent similarity between movies a user rates highly and those with low scores by rounding the ratings
For instance, we could consider ratings of 3, 4, and 5 as a “1” and consider ratings 1 and 2 as unrated, the utility matrix would then look

Now, the Jaccard distance between A and B is 3/4, while between A and C it is 1; i.e., C appears further from A than B does, which is intuitively correct
Applying cosine distance the conclusion that can be drawn is: ???

20 of 33

Alternative measures to consider

Normalizing Ratings
If ratings are normalized, by subtracting from each rating the average rating of that user, low ratings can be turned into negative numbers and high ratings into positive numbers (There are exceptional cases to this too!)
Below is the matrix with the values normalized:

21 of 33

Alternative measures to consider

Normalizing Ratings (Contd..)
If cosine distance is taken, users with opposite views of the movies they viewed in common will have vectors in almost opposite directions and can be considered as far apart as possible
However, users with similar opinions about the movies rated in common will have a relatively small angle between them
The cosine of the angle between A and B is:

The cosine of the angle between A and C is:

Inferences that can be drawn:

A and C are much further apart than A and B, and neither pair is very close.
Both these observations make intuitive sense, given that A and C disagree on the two movies they rated in common, while A and B give similar scores to the one movie they rated in common

22 of 33

Clustering Users and Items

Few drawbacks while trying to draw inferences from a utility matrix:

We have little information about user-item pairs in the sparse utility matrix
Even if two items belong to the same genre, there are likely to be very few users who bought or rated both
Even if two users both like a genre or genres, they may not have bought any items in common

How to deal with above scenario:

Cluster items and/or users
Select any of the distance measures suggested in previous few slides, or any other distance measure, and use it to perform a clustering of, say, items

23 of 33

Clustering Users and Items

Columns represent clusters of items
The entry for any user U and any cluster C is the average rating that U gave to those members of cluster C that U did rate
If U may have rated none of the cluster members, the entry for U and C will be blank
The revised utility matrix can be used to cluster users, again using the distance measure we consider most appropriate
Revise the utility matrix, so the rows correspond to clusters of users, just as the columns correspond to clusters of items
As for item-clusters, compute the entry for a user cluster by averaging the ratings of the users in the cluster
The above steps can be repeated several times if one likes i.e. one can cluster the item clusters and again merge the columns of the utility matrix that belong to one cluster
One can then turn to the users again, and cluster the user clusters
The process can repeat until one has an intuitively reasonable number of clusters of each kind

24 of 33

Exercise: Figure is a utility matrix, representing the ratings, on a 1–5 star scale, of eight items, a through h, by three users A, B, and C. Compute the following from the data of this matrix

Q.1. Treat ratings of 3, 4, and 5 as 1 and 1, 2, and blank as 0. Compute the

Jaccard and cosine distance between each pair of users

	a	b	c	d	e	f	g	h
A	1	1		1			1
B		1	1	1
C				1		1	1	1

Sr. No.		Ans
1	Jaccard Similarity between A&B	0.4
2	Jaccard Distance between A&B	0.6
3	Jaccard Similarity between B&C	0.1667
4	Jaccard Distance between B&C	0.8333
5	Jaccard Similarity between A&C	0.4
6	Jaccard Distance between A&C	0.6
7	Cosine Distance between A&B	0.5773
8	Cosine Distance between B&C	0.2886
9	Cosine Distance between A&C	0.5

25 of 33

Exercise: Figure is a utility matrix, representing the ratings, on a 1–5 star scale, of eight items, a through h, by three users A, B, and C. Compute the following from the data of this matrix

Q.2. Normalize the matrix by subtracting from each nonblank entry the average value for its user and compute the cosine distance between each pair of users

	a	b	c	d	e	f	g	h
A	0.67	1.67		1.67	-2.33		-0.33	-1.33
B		0.67	1.67	0.67	-1.33	-0.33	-1.33
C	-1		-2	0		1	2	0

Sr. No.		Ans
1	Cosine Distance between A&B	0.5841
2	Cosine Distance between B&C	-0.739
3	Cosine Distance between A&C	-0.1151

26 of 33

Mining Social-Network Graphs

27 of 33

Introduction

The best-known example of a social network is the “friends” relation found on sites like Facebook
An important question about a social network is how to identify “communities,” that is, subsets of the nodes (people or other entities that form the network) with unusually strong connections
Communities almost never partition the set of nodes in a network, rather, communities usually overlap

For example, you may belong to several communities of friends or classmates

The people from one community tend to know each other, but people from two different communities rarely know each other

28 of 33

Social Networks as Graphs

The essential characteristics of a social network are:
There is a collection of entities that participate in the network

Typically, these entities are people, but they could be something else entirely

There is at least one relationship between entities of the network

On Facebook or its ilk, this relationship is called friends
Sometimes the relationship is all-or-nothing; two people are either friends or they are not
However, in other examples of social networks, the relationship has a degree
This degree could be discrete; e.g., friends, family, acquaintances, or none

There is an assumption of non-randomness or locality

This condition is the hardest to formalize, but the intuition is that relationships tend to cluster
That is, if entity A is related to both B and C, then there is a higher probability than average that B and C are related

29 of 33

Social Networks as Graphs

Social networks are naturally modeled as graphs, which we sometimes refer to as a social graph

The entities are the nodes, and an edge connects two nodes if the nodes are related by the relationship that characterizes the network

If there is a degree associated with the relationship, this degree is represented by labeling the edges

30 of 33

Types of Social-Network

There are many examples of social networks other than “friends” networks

Telephone Networks	Email Networks	Collaboration Networks
Here the nodes represent phone numbers, which are really individuals There is an edge between two nodes if a call has been placed between those phones in some fixed period of time, The edges could be weighted by the number of calls made between these phones during a given period Communities in a telephone network will form from groups of people that communicate frequently: groups of friends, members of a club, or people working at the same company, for example	The nodes represent email addresses, which are again individuals An edge represents the fact that there was at least one email in at least one direction between the two addresses Alternatively, we may only place an edge if there were emails in both directions In that way, we avoid viewing spammers as “friends” with all their victims Another approach is to label edges as weak or strong Strong edges represent communication in both directions, while weak edges indicate that the communication was in one direction only The communities seen in email networks come from the same sorts of groupings we mentioned in connection with telephone networks	Nodes represent individuals who have published research papers There is an edge between two individuals who published one or more papers jointly Optionally, we can label edges by the number of joint publications The communities in this network are authors working on a particular topic An alternative view of the same data is as a graph in which the nodes are papers Two papers are connected by an edge if they have at least one author in common Now, we form communities that are collections of papers on the same topic Data involved in Collaborative filtering often can be viewed as forming a pair of networks, one for the customers and one for the products. Customers who buy the same sorts of products, e.g., science-fiction books, will form communities, and dually, products that are bought by the same customers will form communities

31 of 33

Clustering of Social Graphs�

An important aspect of social networks is that they contain communities of entities that are connected by many edges
If one were to apply standard clustering techniques to a social-network graph, the first step would be to define a distance measure
When the edges of the graph have labels, these labels might be usable as a distance measure, depending on what they represented
But when the edges are unlabeled, as in a “friends” graph, there is not much one can do to define a suitable distance
The first instinct is to assume that nodes are close if they have an edge between them and distant if not
Thus, one could say that the distance d(x, y) is 0 if there is an edge (x, y) and 1 if there is no such edge
One could use any other two values, such as 1 and ∞, as long as the distance is closer when there is an edge

1 of 33

2 of 33

3 of 33

4 of 33

5 of 33

6 of 33

7 of 33

8 of 33

9 of 33

10 of 33

11 of 33

12 of 33

13 of 33

14 of 33

15 of 33

16 of 33

17 of 33

18 of 33

19 of 33

20 of 33

21 of 33

22 of 33

23 of 33

24 of 33

25 of 33

26 of 33

27 of 33

28 of 33

29 of 33

30 of 33

31 of 33

32 of 33

33 of 33