In my current graduate AI Class (Gatech CS 6601) we had to perform two mini research projects.

The actual course structure is very nice and refreshing. You read the Textbook [1] chapter by chapter at

home and do some of the homework from the book. The actual lecture covers state of the art in AI

and you are required to perform three research projects (Proposal + Implementation + Evaluation +

Paper) in a bi weekly cycle.

In the first I evaluated how to find influential users in a Twitter network and in the second we predicted how a group of people likes a dish using a bayes net. I thought both are worth a blog post.

The first I performed as a team with Patrick Mize.

#### Reach for Friends

Our goal in the project was to identify influential users in a social graph. A social graph represents friendships between users. Each node
in such a graph is an individual and each edge represents a
friendship between two people. In order to generate a social graph, we used the Twitter
api and extracted 1000 nodes and stored them in a database.
The database holds two tables, one for users (with their
Twitter id and name) and one for friendships (mapping a
user to another user). We “crawled” the Twitter
network by recursively searching the friends of a user. In
order to control the growth of the tree we restricted the depth
of the search to 5 and the breadth of the tree to 4. So in the
end we estimated the total number of nodes to be at most 4096 nodes. Because some users had less then 4 friends and because we did not continued searching duplicate users we collected 1600 nodes in the end. Our method for influence

is reach of a node / user. Reach computes as follows:

*For a node all the shortest paths that go through it, the shorter of the halves divided by that node is inserted in a set. The largest distance in that set is the reach of that node.*(see Figure 1*)*In the upper graph the reach for v is on the shortest path between s and g. |

we use a landmark based heuristic to search faster. The landmarks for our heuristics are N distinct nodes from our social graph which show more then 4 connections each. From those N nodes we computed the shortest path to all other nodes in the graph using uniform cost search (Dijkstra’s algorithm). Once we computed the landmarks we can speed up future searches using landmarks as a heuristic [2]. The landmark heuristic is the distance from the current node to the landmark minus the distance from the landmark to the goal node. These distances can be looked up in the table we built while computing the landmarks. Since we want to get as close as possible to the actual distance we take the maximum landmark distance as the value of the heuristic function.

*h(x, y) = max_i ( dist(x, landmark_i)−dist(landmark_i, y) )*

In the end we calculated the influence of some persons in my net and took the one showing the

highest reach. We ended with 3 of my professors, which makes sens, however over a few friends

we found that I am friend with Obama and he got scored lesser then the professors. The explanation

for this puzzling result is that the network was gathered starting from me. So the network is a local one centered around me.

In the second project Woody Folsom and me explored peoples preferences towards food.

#### A Baysian Approach to Collaborative Dish Selection

**Baysian Catering**:

**A use case**. Imagine, you run a catering service and have to plan an event with a customer. You can create a variaty of dishes and now you want to discuss with your clients which one to serve. In order to get a better idea of which preferences and needs you clients will have, you let them fill out a survey in advance, where they rate a small amount of your dishes on a scale from 1 − 10 and inform you about hard constraints like allergies, religious constraints or vegetarians. You then use those results in order to predict the ratings for the rest of your dishes and present the clients the top k results. If such a system works this will save time and will lead to a better cutomer satisfaction since you can present them dishes they will most probably like but still surprise them (since you have not presented them what the already rated). After the dinner participants could rate the dishes served at the party which would improve your work for that client in the future. We model the diners’ various taste preferences using a Bayes net. The net consists of three node types. We call them “control nodes”, “taste nodes” and “rating nodes”. A “taste node” models the probability of a diners’ preference towards an ingredient (P (likes tomato), P (likes potato)) or a category (P (likes meat)). These variables are discrete. The ingredients are conditional independent from each other but conditioned by the food category they belong to (see Figure 2 the two top layers). A control node can definitely reject a dish, by evaluating to 0 in certain conditions. For example if someone is vegetarian and the presented dish contains meet, the control variable for vegetarian will evaluate to 0 and so the probability for the whole dish will become 0. So the vegetarian variable is conditioned by meat. The third type in the net is a preference node, it is continuos and models the dish rating given a set of ingredients. The overall net is shown in Figure 2.

[1] Russel, Norvig: Artificial Intelligence a Modern Approach, 2010, Prentice Hall

[2] C. H. Andrew V. Goldberg. Computing the shortest path: A search meets graph theory. Technical report, Microsoft Research, 2003.

[3] Wang, Knight, and Elder. On computer viral infection and the effect of immunization. In Proceedings of the ACSAC ’00, 2000

[4] Kevin Murphy. A Brief Introduction to Graphical Models and Bayesian Networks.

## No comments:

## Post a Comment