The Ebb and Flow of Controversial Debates on Social Media

By Kiran Garimella and Michael Mathioudakis

Our recent paper titled ‘The Effect of Collective Attention on Controversial Debates on Social Media’ (arXiv link) won the best student paper award at the 9th ACM Web Science conference held in Troy, New York.

The paper studies the evolution of long-lived controversial debates on Twitter – i.e., discussions on topics such as ‘gun control’ or ‘abortion’, that reveal a split of opinion between people who support different sides of the argument.

The main goal of this work is to study dynamic aspects of controversial debates — in particular: (i) whether controversy around the debates has increased over time; and (ii) whether controversy increases or decreases when major associated events occur.


The dataset consists of an 1% sample of Twitter of all tweets generated between September 2011 and September 2016, as published by Twitter and stored on the Internet Archive (link). For the purposes of the study, we focus on subsets of tweets related to major controversial topics in the USA, including Obamacare, Abortion, and Gun Control.

Measuring Controversy

For each topic in the study, we measure the controversy surrounding the topic for each day spanned by the dataset. To do so, we employ the Random Walk Controversy (RWC) method we developed in earlier work [1]. The RWC score essentially quantifies the degree to which the retweet network of a given topic and day is polarized – and, the higher the RWC score, the higher the controversy around the topic. For more details on the RWC score, we refer the interested reader to the full paper [1].

Controversy over Time

Having obtained a controversy score for each topic and day in the dataset, we can now ask whether controversy has increased over the five years covered in the dataset.

The answer to this question is shown in the plot below. The X-axis of the plot spans time at daily granularity, from September 2011 to September 2016; and the Y-axis spans values of the RWC score.


As we see from the figure, even though RWC appears to fluctuate over time, there is no clear trend for increasing or decreasing controversy over time.

Controversy and Collective Attention

Even so, we wish to understand better the fluctuations of controversy over time. Our hypothesis is that the level of controversy around a controversial topic increases or decreases with the collective attention attracted by the topic. In plain terms, we hypothesized that, when a controversial debated was making headlines, the level of controversy around it would increase. For instance,

To test that hypothesis, we follow two steps.

Firstly, we quantified collective attention of a topic a given day as the number of users who post a tweet on that day. As we see in the figure below, this level of attention coincides well with the occurrence of important events related to the topics.

Screen Shot 2017-04-27 at 20.11.01

Secondly, we juxtapose RWC score with Collective attention, as measured at daily granularity. The results are shown in the figure below. Larger values on the X-axis of the plots correspond to higher levels of collective attention, and larger values on the Y-axis correspond to higher levels of RWC score.


The figures reveal a clear trend: the higher the level of collective attention on a controversial topic, the larger the controversy as measured by the RWC score.

It is important to note that this trend was not observed for non-controversial topics.

Other Measures and Future Work

In addition to the discussion above, the full paper studies the behavior of other network- and content-based measures over time.

With this work, we dived deeper into the study of controversial debates and the complex interactions they encompass. In future work, we plan to study the interplay between controversy and echo chamber phenomena.

Stay tuned!

[1] Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. 2016. Quantifying Controversy in Social Media. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16). ACM, New York, NY, USA, 33-42. DOI:




Reducing Controversy by Connecting Opposing Views

Several people have expressed their concern, lately, about high levels of polarization in society. For example, the World Economic Forum’s report on global risks lists the increasing societal polarization as a threat – and others have suggested that social media might be contributing to this phenomenon.

In a recent paper, published at the Tenth International Conference on Web Search and Data Mining (WSDM 2017), we build algorithmic techniques to mitigate the rising polarization by connecting people with opposing views – and evaluate them on Twitter.

In more detail, our approach is to Continue reading

Extracting Skills from Personal Communication Data using StackExchange Dataset

This blog post is a summary of our published work at ACM CIKM. The project is about automatically profiling the skills of users by analyzing their personal communication data. We considered this as a prediction problem, given the messages of the user we had to predict the skills of the user. We made of use of the stack exchange dataset which is freely available here, as a training set. There are many stackexchange websites like stackoverflow, cs, datascience, physics, history and so on. This dataset covers a diverse set of skills and will be automatically updated if new technologies come to the fore.

Continue reading

Using Instagram images to monitor public health

Our recent paper on ‘Social media image analysis for public health‘ will appear as a  short paper in CHI 2016. The question we ask in this paper is whether images uploaded to social media can be used to predict public health variables and lifestyle diseases, such as obesity, diabetes, depression, etc.

Lifestyle diseases are of major concern in the developed world. NYTimes estimates that in addition to costing almost a trillion dollars, lifestyle diseases kill more people than contagious diseases. With the ubiquitous use of social-media platforms in the recent years, it has never been easier to collect and analyze lifestyle choices of large populations. For this reason, social-media data has indeed been used in the past to study or monitor public health. Continue reading

Quantifying controversy on social media

Controversies are everywhere on social media. Studying and understanding the structure and evolution of these controversies is an important area of research. Though there have been previous studies that try to study controversy on social media, they are either too domain specific (e.g., politics) or need prior labeled data.

To address these shortcomings, in our recent WSDM 2016 paper, we designed a fully automatic way to detect ad-hoc controversial issues in the wild, with no prior information or domain knowledge. We represent a topic of discussion with a conversation graph. In this graph vertices represent people and edges represent conversation activity, such as posts, comments, mentions, or endorsements. Our goal is to examine if there are distinguishable patterns in the way conversations are shaped during a controversial event.

Continue reading

Absorbing random walk centrality

Our paper on absorbing random-walk centrality will be presented at ICDM next week. It is a joint work with Harry Mavroforakis and Aristides Gionis.

What is absorbing random-walk centrality (ARW-centrality)?

It is a measure that tells us how central one set of nodes (let’s call it C) is with respect to another set of nodes (let’s call it Q) in a graph. As an example, consider the graph shown in the figure below. In that graph, we use color to indicate the two sets of nodes — Q is shown in red and C is shown in blue. Continue reading

Scalable facility location for massive graphs on pregel-like systems

Our paper on designing a distributed algorithm for solving the facility-location problem was accepted at the CIKM 2015 conference, and will be presented in Melbourne next week.

What is the facility-location problem? Facility location is a classic problem, first studied in the field of operations research. In the problem setting, we are given a set of ‘facilities’ and a set of ‘locations’ and the goal is to find a mapping of the locations to the facilities such that a certain objective function is minimized. The objective function models the operating cost of serving the locations with a set of selected facilities, and it includes two terms: a cost term for opening a new facility, and a cost term for serving a location with an open facility. Continue reading

Who Let the DAGs Out?

Mining for Meaning

We did – in our paper titled “Beyond rankings: comparing directed acyclic graphs” (pdf) which I’ll be presenting at the ECML PKDD conference in Portugal next month. This was the first project of my PhD, but there’s also something else that makes it fundamentally different from the other research projects I’ve been involved with.

Typically, when I undertake a research project, I have a concrete question, like what is the next location a person will visit, to which I start looking for different solutions. In other words, I begin with a nail and start looking for a suitable hammer. However, this time we started by developing a cool new hammer with some neat theoretical properties before we had any idea if a suitable nail even exists.

View original post 1,088 more words

Apartment prices in Helsinki relate to accessibility by public transport

This post is shared with I.Ž. research blog.

Traditionally, apartment prices are considered to relate to the apartment characteristics and its location. We had a hypothesis that accessibility of a neighbourhood perhaps is even more important than its location. So we did a pilot study in Helsinki region to check that.

First we define static and dynamic points of interest in the city. Static points of interest are supposed to capture community centers. We find them by locating H&M stores in Helsinki region. Dynamic points of interest are supposed to capture where people go at different times of day. We find those centers by clustering FourSquare check-ins.


Continue reading