Reducing Controversy by Connecting Opposing Views

Several people have expressed their concern, lately, about high levels of polarization in society. For example, the World Economic Forum’s report on global risks lists the increasing societal polarization as a threat – and others have suggested that social media might be contributing to this phenomenon.

In a recent paper, published at the Tenth International Conference on Web Search and Data Mining (WSDM 2017), we build algorithmic techniques to mitigate the rising polarization by connecting people with opposing views – and evaluate them on Twitter.

In more detail, our approach is to Continue reading

Extracting Skills from Personal Communication Data using StackExchange Dataset

This blog post is a summary of our published work at ACM CIKM. The project is about automatically profiling the skills of users by analyzing their personal communication data. We considered this as a prediction problem, given the messages of the user we had to predict the skills of the user. We made of use of the stack exchange dataset which is freely available here, as a training set. There are many stackexchange websites like stackoverflow, cs, datascience, physics, history and so on. This dataset covers a diverse set of skills and will be automatically updated if new technologies come to the fore.

Continue reading

Using Instagram images to monitor public health

Our recent paper on ‘Social media image analysis for public health‘ will appear as a  short paper in CHI 2016. The question we ask in this paper is whether images uploaded to social media can be used to predict public health variables and lifestyle diseases, such as obesity, diabetes, depression, etc.

Lifestyle diseases are of major concern in the developed world. NYTimes estimates that in addition to costing almost a trillion dollars, lifestyle diseases kill more people than contagious diseases. With the ubiquitous use of social-media platforms in the recent years, it has never been easier to collect and analyze lifestyle choices of large populations. For this reason, social-media data has indeed been used in the past to study or monitor public health. Continue reading

Quantifying controversy on social media

Controversies are everywhere on social media. Studying and understanding the structure and evolution of these controversies is an important area of research. Though there have been previous studies that try to study controversy on social media, they are either too domain specific (e.g., politics) or need prior labeled data.

To address these shortcomings, in our recent WSDM 2016 paper, we designed a fully automatic way to detect ad-hoc controversial issues in the wild, with no prior information or domain knowledge. We represent a topic of discussion with a conversation graph. In this graph vertices represent people and edges represent conversation activity, such as posts, comments, mentions, or endorsements. Our goal is to examine if there are distinguishable patterns in the way conversations are shaped during a controversial event.

Continue reading

Absorbing random walk centrality

Our paper on absorbing random-walk centrality will be presented at ICDM next week. It is a joint work with Harry Mavroforakis and Aristides Gionis.

What is absorbing random-walk centrality (ARW-centrality)?

It is a measure that tells us how central one set of nodes (let’s call it C) is with respect to another set of nodes (let’s call it Q) in a graph. As an example, consider the graph shown in the figure below. In that graph, we use color to indicate the two sets of nodes — Q is shown in red and C is shown in blue. Continue reading

Scalable facility location for massive graphs on pregel-like systems

Our paper on designing a distributed algorithm for solving the facility-location problem was accepted at the CIKM 2015 conference, and will be presented in Melbourne next week.

What is the facility-location problem? Facility location is a classic problem, first studied in the field of operations research. In the problem setting, we are given a set of ‘facilities’ and a set of ‘locations’ and the goal is to find a mapping of the locations to the facilities such that a certain objective function is minimized. The objective function models the operating cost of serving the locations with a set of selected facilities, and it includes two terms: a cost term for opening a new facility, and a cost term for serving a location with an open facility. Continue reading