Understanding the mechanisms of influence and contagion in human interactions is a central focal point in today’s research, in computational social science, machine learning, and data mining. Quantifying the effects of social influence was the question behind the controversial recently-published facebook experiment, which aimed to show that emotional contagion occurs on social networks (the NYT’s article).
As Duncan Watts argues in his Guardian article, in defense on the facebook experiment, data from social-media applications provide a goldmine to understand human behavior and answer social-science questions. Online social networks, where people “post” messages about topics that they find interesting, are ideal environments to study and understand the mechanisms of influence. Indeed, over the last few years, researchers in data mining and machine learning have been pursuing this opportunity.
One particular line of work, motivated by these ideas, has become known as the network-inference problem. The idea is to model influence as a network: there is an edge from person A to person B, labeled with a probability value p, which indicates the degree of influence that A exercises to B. One way to think about it, in the context of social-media applications, is that p expresses the probability that B will post a message on a topic, if A posts first a message on that topic.
The research question is to learn all pairwise probabilities p, by only observing the actions (another way to say posts) that take place in the network. As described so far, this is an ill-defined problem. For example, imagine that, using my favorite social-media application, I post a link to some thought-provoking article that I read recently, while three of my friends have also posted links to the same article. Am I influenced by one of them, and if yes by whom, or did I just happen to come across to this article via another means, e.g., browsing a news portal? The way to turn this question into a computational problem, is to assume some model of influence, and then use the large volume of available data to find a maximum likelihood estimate for the parameters of the model.
A number of different research papers study the network-inference problem and its variants, depending on the type of the available input data, the assumed influence model, and other computational restrictions that may apply.
In a recent paper that appeared in ICML 2014, co-authored with Hongyu Su and Juho Rousu, we considered a new approach to this problem. Our idea is that the influence between two persons depends largely on the topic under consideration: for example, it is likely that I will react to discussions about “big data” initiated by my computer-science colleagues, but with respect to political topics it is more likely that I will be influenced by the blog posts of my favorite analysts. In our ICML paper, we develop this idea. We model influence in a context-sensitive manner. The influence that person A exercises to person B depends on the particular message spread in the social network. A message is simply a piece of text represented in a vector-space model. In this setting we define the network-response problem. Given a new message, represented again in a vector-space model, we want to find how this message will spread in the network and who will influence whom.
Even though it is designed to take advantage of context information, our technique is also applicable when no context is available. The context-free scenario is the setting studied by many previous papers on the network inference problem. So for this special case, we were able to compare with state-of-the-art techniques, and demonstrate that our method provides significantly better results. Another nice feature of our solution is that it makes no assumption regarding the influence model. The technical details on our solution and the results of our study can be found in the ICML paper.